From d654a24e58aa6fce8fb30435349b165a99ae03a9 Mon Sep 17 00:00:00 2001 From: Kaylea Nelson Date: Wed, 29 Nov 2023 14:55:58 -0500 Subject: [PATCH] Deployed 09999c1d with MkDocs version: 1.3.0 --- data/hpc-storage/index.html | 1 - search/search_index.json | 2 +- sitemap.xml.gz | Bin 1062 -> 1062 bytes 3 files changed, 1 insertion(+), 2 deletions(-) diff --git a/data/hpc-storage/index.html b/data/hpc-storage/index.html index 0fb223532..6401e6ed6 100644 --- a/data/hpc-storage/index.html +++ b/data/hpc-storage/index.html @@ -2474,7 +2474,6 @@

Purchased Storage

See below for details on purchasing storage. Purchased storage, if applicable, is located on the Gibbs filesystem in a /gpfs/gibbs/pi/ directory under the group's name.

Unlike project space described above, all files in your purchased storage count towards your quotas, regardless of file ownership.

-

All purchased storage

60-Day Scratch

Quota: 10 TiB and 15,000,000 files per group

60-day scratch is intended to be used for storing temporary data. Any file in this space older than 60 days will automatically be deleted. We send out a weekly warning about files we expect to delete the following week. Like project, scratch quota is shared by your entire research group. If we begin to run low on storage, you may be asked to delete files younger than 60 days old. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage.

diff --git a/search/search_index.json b/search/search_index.json index 53f926b11..8a8096b2e 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Introduction The Yale Center for Research Computing provides support for research computing at Yale University. Our most active area for support is High Performance Computing, however we also support other computationally intensive research. In addition, we work with faculty and research groups across disciplines to design and maintain cost-effective computing capabilities. Introducing the McCleary HPC Cluster The YCRC is pleased to announce the new McCleary HPC cluster, which now serves researchers from the Yale School of Medicine, Yale Center for Genome Analysis and life scientists in the Faculty of Arts and Sciences! For more information, see our McCleary documentation . Get Help To best serve the research community, we provide one-on-one consulting and use a support tracking system. Troubleshooting Login Issues If you are experiencing issues logging into one of the clusters, please first check the current System Status for known issues and check the Troubleshoot Login guide first before seeking additional assistance. Web and Email Support To submit requests, issues, or questions please send us an email at hpc@yale.edu or sign on to our online support system at help.ycrc.yale.edu . Your login credentials there are your email and a password of your choosing, not your CAS password. Once received, our system will send an automated response with a link to a ticket. From there we'll track your ticket and make sure it's handled properly by the right person. Replies via email or the online support system go to the same place and are interchangeable. Constructive feedback is much appreciated. Office Hours via Zoom The YCRC hosts weekly office hours via Zoom on Wednesdays at 11am-12pm EST . Every Wednesday, Research support team members will be available to answer questions about the HPC clusters, data storage, cluster usage, etc. No appointments are necessary. Link: https://yale.zoom.us/my/ycrcsupport Phone: 203-432-9666 (2-ZOOM if on-campus)or 646 568 7788; Meeting ID: 224 666 8665 YouTube Channel The YCRC YouTube channel features recorded tutorials and workshops that cover a wide range of computing topics. New videos are added regularly and suggestions for topics can be submitted by emailing research.computing@yale.edu . One-on-One Support Research support team members are available by appointment for one-on-one support. See the table below for information about each person's area of particular focus. Please send requests for appointments with a team member to research.computing@yale.edu . If you have a general question or are unsure about who to meet with, include as much detail as possible about your request and we'll find the right person for you. Specialist Cluster(s) Areas of Focus Kathleen McKiernan All Getting Started Rob Bjornson, Ph.D. McCleary Life Sciences, Bioinformatics, Python, R Tom Langford, Ph.D. Grace / Milgram Physics, Python, MPI Aya Nawano, Ph.D. Grace Molecular Dynamics, Matlab, C/C++ Kaylea Nelson, Ph.D. Grace / Milgram Astronomy, EPS dept, MPI, Python Mike Rothberg, Ph.D. McCleary Computational Chemistry, Python, Matlab Michael Strickler, Ph.D. McCleary Life Sciences, Structural Biology Ping Luo Milgram Wu Tsai Institute, Psychology dept, Open OnDemand Andy Sherman, Ph.D. Grace MPI, GPUs Misha Guy, Ph.D. SRSC Software and Mathematica (email at mikhael.guy@yale.edu for appointment) Q&A Platform The YCRC hosts a Q&A platform at ask.cyberinfrastructure.org . Post questions about the clusters and receive answers from YCRC staff or even your peers! The sub-site for YCRC related questions is available at ask.cyberinfrastructure.org/g/Yale . Acknowledge the YCRC If publishing work performed on a YCRC cluster or with assistance from YCRC staff, we greatly appreciate acknowledgement of our staff and computing time in your publication. A list of YCRC staff can be found on our Staff page , and the clusters are summarized on our HPC Resources page . Example acknowledgement below: We thank the Yale Center for Research Computing, specifically [YCRC staff member name(s)], for guidance and assistance in computation run on the [cluster name here] cluster. Additionally, if you would be willing to send the publication information to research.computing@yale.edu , that would assist our efforts to capture work performed on YCRC resources and we can promote your work on our research.computing.yale.edu website.","title":"Introduction"},{"location":"#introduction","text":"The Yale Center for Research Computing provides support for research computing at Yale University. Our most active area for support is High Performance Computing, however we also support other computationally intensive research. In addition, we work with faculty and research groups across disciplines to design and maintain cost-effective computing capabilities. Introducing the McCleary HPC Cluster The YCRC is pleased to announce the new McCleary HPC cluster, which now serves researchers from the Yale School of Medicine, Yale Center for Genome Analysis and life scientists in the Faculty of Arts and Sciences! For more information, see our McCleary documentation .","title":"Introduction"},{"location":"#get-help","text":"To best serve the research community, we provide one-on-one consulting and use a support tracking system. Troubleshooting Login Issues If you are experiencing issues logging into one of the clusters, please first check the current System Status for known issues and check the Troubleshoot Login guide first before seeking additional assistance.","title":"Get Help"},{"location":"#web-and-email-support","text":"To submit requests, issues, or questions please send us an email at hpc@yale.edu or sign on to our online support system at help.ycrc.yale.edu . Your login credentials there are your email and a password of your choosing, not your CAS password. Once received, our system will send an automated response with a link to a ticket. From there we'll track your ticket and make sure it's handled properly by the right person. Replies via email or the online support system go to the same place and are interchangeable. Constructive feedback is much appreciated.","title":"Web and Email Support"},{"location":"#office-hours-via-zoom","text":"The YCRC hosts weekly office hours via Zoom on Wednesdays at 11am-12pm EST . Every Wednesday, Research support team members will be available to answer questions about the HPC clusters, data storage, cluster usage, etc. No appointments are necessary. Link: https://yale.zoom.us/my/ycrcsupport Phone: 203-432-9666 (2-ZOOM if on-campus)or 646 568 7788; Meeting ID: 224 666 8665","title":"Office Hours via Zoom"},{"location":"#youtube-channel","text":"The YCRC YouTube channel features recorded tutorials and workshops that cover a wide range of computing topics. New videos are added regularly and suggestions for topics can be submitted by emailing research.computing@yale.edu .","title":"YouTube Channel"},{"location":"#one-on-one-support","text":"Research support team members are available by appointment for one-on-one support. See the table below for information about each person's area of particular focus. Please send requests for appointments with a team member to research.computing@yale.edu . If you have a general question or are unsure about who to meet with, include as much detail as possible about your request and we'll find the right person for you. Specialist Cluster(s) Areas of Focus Kathleen McKiernan All Getting Started Rob Bjornson, Ph.D. McCleary Life Sciences, Bioinformatics, Python, R Tom Langford, Ph.D. Grace / Milgram Physics, Python, MPI Aya Nawano, Ph.D. Grace Molecular Dynamics, Matlab, C/C++ Kaylea Nelson, Ph.D. Grace / Milgram Astronomy, EPS dept, MPI, Python Mike Rothberg, Ph.D. McCleary Computational Chemistry, Python, Matlab Michael Strickler, Ph.D. McCleary Life Sciences, Structural Biology Ping Luo Milgram Wu Tsai Institute, Psychology dept, Open OnDemand Andy Sherman, Ph.D. Grace MPI, GPUs Misha Guy, Ph.D. SRSC Software and Mathematica (email at mikhael.guy@yale.edu for appointment)","title":"One-on-One Support"},{"location":"#qa-platform","text":"The YCRC hosts a Q&A platform at ask.cyberinfrastructure.org . Post questions about the clusters and receive answers from YCRC staff or even your peers! The sub-site for YCRC related questions is available at ask.cyberinfrastructure.org/g/Yale .","title":"Q&A Platform"},{"location":"#acknowledge-the-ycrc","text":"If publishing work performed on a YCRC cluster or with assistance from YCRC staff, we greatly appreciate acknowledgement of our staff and computing time in your publication. A list of YCRC staff can be found on our Staff page , and the clusters are summarized on our HPC Resources page . Example acknowledgement below: We thank the Yale Center for Research Computing, specifically [YCRC staff member name(s)], for guidance and assistance in computation run on the [cluster name here] cluster. Additionally, if you would be willing to send the publication information to research.computing@yale.edu , that would assist our efforts to capture work performed on YCRC resources and we can promote your work on our research.computing.yale.edu website.","title":"Acknowledge the YCRC"},{"location":"glossary/","text":"Glossary To help clarify the way we refer to certain terms in our user documentation, here is a brief list of some of the words that regularly come up in our documents. Please reach out to us at hpc@yale.edu if there are any other words that need to be added. Account - used to authenticate and grant permission to access resources Account (Slurm) - an accounting mechanism to keep track of a group's computing usage Activate - making something operational Array - a data structure across a series of memory locations consisting of elements organized in an index Array (job) - a series of jobs that all request the same resources and run the same batch script Array Task ID - a unique sequential number with an appended number that refers to an individual task within the set of submitted jobs Channel - Community-led collections of packages created by a group or lab installed with conda to allow for a homogenous environment across systems CLI - Command Line Interface processes commands to a computer program in the form of lines of text in a window Cluster - a set of computers, called nodes) networked together so nodes can perform the tasks facilitated by a scheduling software Command - a specific order from a computer to execute a service with either an application or the operating system Compute Node - the nodes that work runs on to perform computational work Container - A stack of software, libraries and operating system that is independent of the host computer and can be accessed on other computers Container Image - Self-contained read-only files used to run applications CPU - Central Processing Units are the components of a system that perform basic operations and exchange data with the system\u2019s memory (also known as a processor) Data - items of information collected together for reference or analysis Database - a collection of structured data held within a computer Deactivate - making something de-operational Environment - a collection of hardware, software, data storage and networks that work together in facilitating the processing and exchange of information Extension - Suffix at the end of a filename to indicate the file type Fileset - a section of a storage device that is given a designated purpose Filesystem - a process that manages how and where data is stored Flag - (see Options) GPU - Graphics Processing Units are specialized circuits designed to rapidly manipulate memory and create images in a frame buffer for a displayed output GridFTP - an extension of the Fire Transfer Protocol for grid computing that allows users to transfer and save data on a different account such as Google Drive or other off network memory Group - a collection of users who can all be given the same permissions on a system GUI - Graphical User Interface allows users to interact with devices and applications through a visual window that can commonly display icons and predetermined fields Hardware - the physical parts of a computer Host - (ie. Host Computer) A device connected to a computer network that offers resources, services and applications to users on the network Image - (See Container Image) Index - a method of sorting data by creating keywords or a listing of data Interface - a boundary across which two or more computer system components can exchange information Job - a unit of work given to an operating system by a scheduler Job Array - a way to submit multiple similar jobs by associating each subjob with an index value based on an array task id Key - a variable value applied using an algorithm to a block of unencrypted text to produce encrypted text Load - transfer a program or data into memory or into the CPU Login Node - a node that users log in on to access the cluster Memory - (see RAM) Metadata - A set of data that describes and gives basic information about other data Module - a number of distinct but interrelated units that build up or into a program MPI - Message Passing Interface is a standardized and portable message-passing standard designed to function on parallel computing architectures Multiprocessing - the ability to operate more than one task simultaneously on the same program across two or more processors in a computer Node - a server in the cluster Option - a single letter or full word that modifies the behavior of a command in a predetermined way (also known as a flag or switch) Package - a collection of hardware and software needed to create a working system Parallel - (ex. Computing/Programming) Architecture in which several processes are carried out simultaneously across smaller, independent parts Partition - a section of a storage device that is given a designated purpose Partition (Slurm) - a collection of compute nodes available via the scheduler Path - A string of characters used to identify locations throughout a directory structure Pane - (Associate with window) A subdivision within a window where an independent terminal can run simultaneously alongside another terminal Processor - (see CPU) Queue - a sequence of objects arranged according to priority waiting to be processed RAM - Random Access Memory, also known as \"Memory\" can be read and changed in any order and is typically used to to store working data Reproducibility - the ability to execute the same results across multiple systems by different individuals using the same data Scheduler - the software used to assign resources to a job for tasks Scheduling - the act of assigning resources to a task through a software product Session - a temporary information exchange between two or more devices SSH - secure shell is a cryptographic network protocol for operating network services securely over an unsecured network Software - a collection of data and instructions that tell a computer how to operate Switch - (see Options) System - a set of integrated hardware and software that input, output, process, and store data and information Task ID - a unique sequential number used to refer to a task Terminal - Referring to a terminal program, a text-based interface for typing commands Toolchain - a set of tools performing individual actions used in delivering an operation Unload - remove a program or data from memory or out of the CPU User - a person interacting and utilizing a computing service Variable - assigned and referenced data values that can be called within a program and changed depending on how the program runs Window - (Associate with pane) the whole screen being displayed, containing subdivisions, or panes, that can run independent terminals alongside each other","title":"Glossary"},{"location":"glossary/#glossary","text":"To help clarify the way we refer to certain terms in our user documentation, here is a brief list of some of the words that regularly come up in our documents. Please reach out to us at hpc@yale.edu if there are any other words that need to be added. Account - used to authenticate and grant permission to access resources Account (Slurm) - an accounting mechanism to keep track of a group's computing usage Activate - making something operational Array - a data structure across a series of memory locations consisting of elements organized in an index Array (job) - a series of jobs that all request the same resources and run the same batch script Array Task ID - a unique sequential number with an appended number that refers to an individual task within the set of submitted jobs Channel - Community-led collections of packages created by a group or lab installed with conda to allow for a homogenous environment across systems CLI - Command Line Interface processes commands to a computer program in the form of lines of text in a window Cluster - a set of computers, called nodes) networked together so nodes can perform the tasks facilitated by a scheduling software Command - a specific order from a computer to execute a service with either an application or the operating system Compute Node - the nodes that work runs on to perform computational work Container - A stack of software, libraries and operating system that is independent of the host computer and can be accessed on other computers Container Image - Self-contained read-only files used to run applications CPU - Central Processing Units are the components of a system that perform basic operations and exchange data with the system\u2019s memory (also known as a processor) Data - items of information collected together for reference or analysis Database - a collection of structured data held within a computer Deactivate - making something de-operational Environment - a collection of hardware, software, data storage and networks that work together in facilitating the processing and exchange of information Extension - Suffix at the end of a filename to indicate the file type Fileset - a section of a storage device that is given a designated purpose Filesystem - a process that manages how and where data is stored Flag - (see Options) GPU - Graphics Processing Units are specialized circuits designed to rapidly manipulate memory and create images in a frame buffer for a displayed output GridFTP - an extension of the Fire Transfer Protocol for grid computing that allows users to transfer and save data on a different account such as Google Drive or other off network memory Group - a collection of users who can all be given the same permissions on a system GUI - Graphical User Interface allows users to interact with devices and applications through a visual window that can commonly display icons and predetermined fields Hardware - the physical parts of a computer Host - (ie. Host Computer) A device connected to a computer network that offers resources, services and applications to users on the network Image - (See Container Image) Index - a method of sorting data by creating keywords or a listing of data Interface - a boundary across which two or more computer system components can exchange information Job - a unit of work given to an operating system by a scheduler Job Array - a way to submit multiple similar jobs by associating each subjob with an index value based on an array task id Key - a variable value applied using an algorithm to a block of unencrypted text to produce encrypted text Load - transfer a program or data into memory or into the CPU Login Node - a node that users log in on to access the cluster Memory - (see RAM) Metadata - A set of data that describes and gives basic information about other data Module - a number of distinct but interrelated units that build up or into a program MPI - Message Passing Interface is a standardized and portable message-passing standard designed to function on parallel computing architectures Multiprocessing - the ability to operate more than one task simultaneously on the same program across two or more processors in a computer Node - a server in the cluster Option - a single letter or full word that modifies the behavior of a command in a predetermined way (also known as a flag or switch) Package - a collection of hardware and software needed to create a working system Parallel - (ex. Computing/Programming) Architecture in which several processes are carried out simultaneously across smaller, independent parts Partition - a section of a storage device that is given a designated purpose Partition (Slurm) - a collection of compute nodes available via the scheduler Path - A string of characters used to identify locations throughout a directory structure Pane - (Associate with window) A subdivision within a window where an independent terminal can run simultaneously alongside another terminal Processor - (see CPU) Queue - a sequence of objects arranged according to priority waiting to be processed RAM - Random Access Memory, also known as \"Memory\" can be read and changed in any order and is typically used to to store working data Reproducibility - the ability to execute the same results across multiple systems by different individuals using the same data Scheduler - the software used to assign resources to a job for tasks Scheduling - the act of assigning resources to a task through a software product Session - a temporary information exchange between two or more devices SSH - secure shell is a cryptographic network protocol for operating network services securely over an unsecured network Software - a collection of data and instructions that tell a computer how to operate Switch - (see Options) System - a set of integrated hardware and software that input, output, process, and store data and information Task ID - a unique sequential number used to refer to a task Terminal - Referring to a terminal program, a text-based interface for typing commands Toolchain - a set of tools performing individual actions used in delivering an operation Unload - remove a program or data from memory or out of the CPU User - a person interacting and utilizing a computing service Variable - assigned and referenced data values that can be called within a program and changed depending on how the program runs Window - (Associate with pane) the whole screen being displayed, containing subdivisions, or panes, that can run independent terminals alongside each other","title":"Glossary"},{"location":"news/","text":"News {{ blog_content }}","title":"News"},{"location":"news/#news","text":"{{ blog_content }}","title":"News"},{"location":"user-group/","text":"YCRC User Group The YCRC User Group is a community of researchers at Yale who utilize computing resources and technology to enable their research. You can join the User Group mailing list and forum where you can post questions or tips to other YCRC users at https://groups.io/g/ycrcusergroup .","title":"YCRC User Group"},{"location":"user-group/#ycrc-user-group","text":"The YCRC User Group is a community of researchers at Yale who utilize computing resources and technology to enable their research. You can join the User Group mailing list and forum where you can post questions or tips to other YCRC users at https://groups.io/g/ycrcusergroup .","title":"YCRC User Group"},{"location":"clusters/","text":"HPC Resources The YCRC maintains and supports a number of high performance computing systems for the Yale research community. Our high performance computing systems are named after notable members of the Yale community . Each YCRC cluster undergoes regular scheduled maintenance twice a year, see our maintenance schedule for more details. For proposals, we provide a description of our facilities, equipment, and other resources for HPC and research computing . Compute We maintain and support three Red Hat Linux compute clusters, listed below. Please click on cluster names for more information. Info The Farnam and Ruddle clusters were both retired in 2023 and their users are now supported on the McCleary cluster. Cluster Name Approx. Core Count Approx. Node Count Login Address Purpose Grace 26,000 740 grace.ycrc.yale.edu general and highly parallel, tightly coupled (InfiniBand) McCleary 13,000 340 mccleary.ycrc.yale.edu medical and life science, YCGA Milgram 2,400 80 milgram.ycrc.yale.edu HIPAA and other sensitive data Storage We maintain several high performance storage systems. Listed below are these shared filesystems and the clusters where they are available. We distinguish where clusters store their home directories with an asterisk. The directory /home will always point to your home directory on the cluster you logged into. For more information about storage quotas and purchasing storage see the Cluster Storage page. Name Path Size Mounting Clusters File System Software Purpose Palmer /vast/palmer 700 TiB Grace*, McCleary* Vast home, scratch storage Gibbs /gpfs/gibbs 14.0 PiB Grace, McCleary IBM Spectrum Scale (GPFS) project, purchased project-style storage Slayman /gpfs/slayman 1.0 PiB Grace, McCleary IBM Spectrum Scale (GPFS) purchased project-style storage Milgram /gpfs/milgram 3.0 PiB Milgram* IBM Spectrum Scale (GPFS) Milgram primary storage YCGA /gpfs/ycga 3.0 PiB McCleary IBM Spectrum Scale (GPFS) YCGA storage","title":"Overview"},{"location":"clusters/#hpc-resources","text":"The YCRC maintains and supports a number of high performance computing systems for the Yale research community. Our high performance computing systems are named after notable members of the Yale community . Each YCRC cluster undergoes regular scheduled maintenance twice a year, see our maintenance schedule for more details. For proposals, we provide a description of our facilities, equipment, and other resources for HPC and research computing .","title":"HPC Resources"},{"location":"clusters/#compute","text":"We maintain and support three Red Hat Linux compute clusters, listed below. Please click on cluster names for more information. Info The Farnam and Ruddle clusters were both retired in 2023 and their users are now supported on the McCleary cluster. Cluster Name Approx. Core Count Approx. Node Count Login Address Purpose Grace 26,000 740 grace.ycrc.yale.edu general and highly parallel, tightly coupled (InfiniBand) McCleary 13,000 340 mccleary.ycrc.yale.edu medical and life science, YCGA Milgram 2,400 80 milgram.ycrc.yale.edu HIPAA and other sensitive data","title":"Compute"},{"location":"clusters/#storage","text":"We maintain several high performance storage systems. Listed below are these shared filesystems and the clusters where they are available. We distinguish where clusters store their home directories with an asterisk. The directory /home will always point to your home directory on the cluster you logged into. For more information about storage quotas and purchasing storage see the Cluster Storage page. Name Path Size Mounting Clusters File System Software Purpose Palmer /vast/palmer 700 TiB Grace*, McCleary* Vast home, scratch storage Gibbs /gpfs/gibbs 14.0 PiB Grace, McCleary IBM Spectrum Scale (GPFS) project, purchased project-style storage Slayman /gpfs/slayman 1.0 PiB Grace, McCleary IBM Spectrum Scale (GPFS) purchased project-style storage Milgram /gpfs/milgram 3.0 PiB Milgram* IBM Spectrum Scale (GPFS) Milgram primary storage YCGA /gpfs/ycga 3.0 PiB McCleary IBM Spectrum Scale (GPFS) YCGA storage","title":"Storage"},{"location":"clusters/farnam/","text":"Farnam Farnam was a shared-use resource for the Yale School of Medicine (YSM). The Farnam Cluster was named for Louise Whitman Farnam , the first woman to graduate from the Yale School of Medicine, class of 1916. Farnam Retirement After more than six years in service, the Farnam HPC cluster was retired on June 1, 2023. Farnam was replaced with the new HPC cluster, McCleary . For more information and updates see the McCleary announcement page .","title":"Farnam"},{"location":"clusters/farnam/#farnam","text":"Farnam was a shared-use resource for the Yale School of Medicine (YSM). The Farnam Cluster was named for Louise Whitman Farnam , the first woman to graduate from the Yale School of Medicine, class of 1916. Farnam Retirement After more than six years in service, the Farnam HPC cluster was retired on June 1, 2023. Farnam was replaced with the new HPC cluster, McCleary . For more information and updates see the McCleary announcement page .","title":"Farnam"},{"location":"clusters/grace/","text":"Grace Grace is a shared-use resource for the Faculty of Arts and Sciences (FAS). It consists of a variety of compute nodes networked over low-latency InfiniBand and mounts several shared filesystems. The Grace cluster is is named for the computer scientist and United States Navy Rear Admiral Grace Murray Hopper , who received her Ph.D. in Mathematics from Yale in 1934. Operating System Upgrade During the August 2023 maintenance, the operating system on Grace was upgraded from Red Hat 7 to Red Hat 8. For more information, see our Grace Operating System Upgrade page. Access the Cluster Once you have an account , the cluster can be accessed via ssh or through the Open OnDemand web portal . Warning You are limited to 4 interactive app instances (of any type) at one time. Additional instances will be rejected until you delete older open instances. For OnDemand jobs, closing the window does not terminate the interactive app job. To terminate the job, click the \"Delete\" button in your \"My Interactive Apps\" page in the web portal. System Status and Monitoring For system status messages and the schedule for upcoming maintenance, please see the system status page . For a current node-level view of job activity, see the cluster monitor page (VPN only) . Partitions and Hardware Grace is made up of several kinds of compute nodes. We group them into (sometimes overlapping) Slurm partitions meant to serve different purposes. By combining the --partition and --constraint Slurm options you can more finely control what nodes your jobs can run on. Job Submission Rate Limits Job submissions are limited to 200 jobs per hour . See the Rate Limits section in the Common Job Failures page for more info. Public Partitions See each tab below for more information about the available common use partitions. day Use the day partition for most batch jobs. This is the default if you don't specify one with --partition . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the day partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per group 2500 Maximum CPUs per user 1000 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 72 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, common, bigtmp 97 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp devel Use the devel partition to jobs with which you need ongoing interaction. For example, exploratory analyses or debugging compilation. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the devel partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 4 Maximum memory per user 32G Maximum submitted jobs per user 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6126 24 174 skylake, avx512, 6126, nogpu, standard, common 4 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest week Use the week partition for jobs that need a longer runtime than day allows. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the week partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Maximum CPUs per group 252 Maximum CPUs per user 108 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 25 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp transfer Use the transfer partition to stage data for your jobs to and from cluster storage . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the transfer partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum running jobs per user 2 Maximum CPUs per job 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 7642 8 237 epyc, 7642, nogpu, standard, common gpu Use the gpu partition for jobs that make use of GPUs. You must request GPUs explicitly with the --gpus option in order to use them. For example, --gpus=gtx1080ti:2 would request 2 GeForce GTX 1080Ti GPUs per node. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Info Interactive jobs ( salloc or Open OnDemand) are not allowed in the gpu partition. Please submit those jobs to gpu_devel . GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the gpu partition are subject to the following limits: Limit Value Maximum job time limit 2-00:00:00 Maximum GPUs per user 24 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 12 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, common 5 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, common, bigtmp, rtx2080ti 2 6240 36 361 a100 4 40 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, a100 4 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, common, v100 2 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 2 6136 24 90 v100 2 16 skylake, avx512, 6136, doubleprecision, common, bigtmp, v100 gpu_devel Use the gpu_devel partition to debug jobs that make use of GPUs, or to develop GPU-enabled code. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the gpu_devel partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 10 Maximum GPUs per user 4 Maximum submitted jobs per user 2 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, common, bigtmp, rtx2080ti 1 6240 36 361 a100 4 40 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, a100 2 6240 36 166 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, rtx3090 1 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, common, v100 4 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 bigmem Use the bigmem partition for jobs that have memory requirements other partitions can't handle. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the bigmem partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 40 Maximum memory per user 4000G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 3 6240 36 1505 cascadelake, avx512, 6240, nogpu, common, bigtmp 4 6346 32 3936 cascadelake, avx512, 6346, common, nogpu, bigtmp 2 6234 16 1505 cascadelake, avx512, nogpu, 6234, common, bigtmp mpi Use the mpi partition for tightly-coupled parallel programs that make efficient use of multiple nodes. See our MPI documentation if your workload fits this description. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --exclusive --mem=92160 Job Limits Jobs submitted to the mpi partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum nodes per group 64 Maximum nodes per user 64 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 128 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, common, bigtmp scavenge Use the scavenge partition to run preemptable jobs on more resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the scavenge partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 10000 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 12 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, common 2 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 72 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, common, bigtmp 72 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 87 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 135 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp 5 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, common, bigtmp, rtx2080ti 2 6240 36 361 a100 4 40 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, a100 20 8260 96 181 cascadelake, avx512, 8260, nogpu, pi 2 6240 36 166 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, rtx3090 4 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 2 6240 36 180 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, pi, rtx3090 4 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, common, v100 3 6240 36 1505 cascadelake, avx512, 6240, nogpu, common, bigtmp 1 6326 32 1001 a100 4 40 cascadelake, avx512, 6326, doubleprecision, bigtmp, pi, a100-80g 4 6346 32 3936 cascadelake, avx512, 6346, common, nogpu, bigtmp 3 6234 16 1505 cascadelake, avx512, nogpu, 6234, pi, bigtmp 1 6254 36 370 rtx2080ti 8 11 cascadelake, avx512, 6254, singleprecision, pi, bigtmp, rtx2080ti 2 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, pi, v100 8 6240 36 370 cascadelake, avx512, 6240, nogpu, pi, bigtmp 2 6234 16 1505 cascadelake, avx512, nogpu, 6234, common, bigtmp 6 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 16 6136 24 90 edr, skylake, avx512, 6136, nogpu, standard, pi, bigtmp 3 6142 32 181 skylake, avx512, 6142, nogpu, standard, pi, bigtmp 16 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, pi, bigtmp 128 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, common, bigtmp 4 6136 24 90 hdr, skylake, avx512, 6136, nogpu, pi, common, bigtmp 2 6136 24 90 v100 2 16 skylake, avx512, 6136, doubleprecision, common, bigtmp, v100 9 6136 24 181 p100 4 16 skylake, avx512, 6136, doubleprecision, pi, p100 2 5122 8 181 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 1 6136 24 749 skylake, avx512, 6136, nogpu, pi, bigtmp 74 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest 2 E7-4820_v4 40 1505 broadwell, E7-4820_v4, nogpu, pi, oldest 1 E5-2637_v4 8 119 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, bigtmp, gtx1080ti, oldest 1 E5-2660_v4 28 245 p100 1 16 broadwell, E5-2660_v4, doubleprecision, pi, p100, oldest scavenge_gpu Use the scavenge_gpu partition to run preemptable jobs on more GPU resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the scavenge_gpu partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum GPUs per user 30 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 12 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, common 2 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 5 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, common, bigtmp, rtx2080ti 2 6240 36 361 a100 4 40 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, a100 2 6240 36 166 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, rtx3090 4 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 2 6240 36 180 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, pi, rtx3090 4 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, common, v100 1 6326 32 1001 a100 4 40 cascadelake, avx512, 6326, doubleprecision, bigtmp, pi, a100-80g 1 6254 36 370 rtx2080ti 8 11 cascadelake, avx512, 6254, singleprecision, pi, bigtmp, rtx2080ti 2 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, pi, v100 6 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 2 6136 24 90 v100 2 16 skylake, avx512, 6136, doubleprecision, common, bigtmp, v100 9 6136 24 181 p100 4 16 skylake, avx512, 6136, doubleprecision, pi, p100 2 5122 8 181 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 1 E5-2637_v4 8 119 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, bigtmp, gtx1080ti, oldest 1 E5-2660_v4 28 245 p100 1 16 broadwell, E5-2660_v4, doubleprecision, pi, p100, oldest scavenge_mpi Use the scavenge_mpi partition to run preemptable jobs on more MPI resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --exclusive --mem=92160 Job Limits Jobs submitted to the scavenge_mpi partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum nodes per group 64 Maximum nodes per user 64 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 128 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, common, bigtmp Private Partitions With few exceptions, jobs submitted to private partitions are not considered when calculating your group's Fairshare . Your group can purchase additional hardware for private use, which we will make available as a pi_groupname partition. These nodes are purchased by you, but supported and administered by us. After vendor support expires, we retire compute nodes. Compute nodes can range from $10K to upwards of $50K depending on your requirements. If you are interested in purchasing nodes for your group, please contact us . PI Partitions (click to expand) pi_anticevic Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_anticevic partition are subject to the following limits: Limit Value Maximum job time limit 100-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 20 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_balou Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_balou partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 14 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 9 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 26 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_berry Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_berry partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 1 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp pi_chem_chase Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=3840 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_chem_chase partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 8 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 1 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti pi_cowles Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_cowles partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Maximum CPUs per user 120 Maximum nodes per user 5 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 9 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp pi_econ_io Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_econ_io partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 6 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_econ_lp Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_econ_lp partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 7 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 5 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 1 6234 16 1505 cascadelake, avx512, nogpu, 6234, pi, bigtmp pi_esi Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_esi partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Maximum CPUs per user 648 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 36 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_fedorov Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=3840 Job Limits Jobs submitted to the pi_fedorov partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 12 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, pi, bigtmp 4 6136 24 90 hdr, skylake, avx512, 6136, nogpu, pi, common, bigtmp pi_gelernter Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_gelernter partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 1 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_hammes_schiffer Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=3840 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_hammes_schiffer partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 8 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 6 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 1 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 16 6136 24 90 edr, skylake, avx512, 6136, nogpu, standard, pi, bigtmp 2 5122 8 181 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 1 6136 24 749 skylake, avx512, 6136, nogpu, pi, bigtmp 1 E5-2637_v4 8 119 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, bigtmp, gtx1080ti, oldest pi_hodgson Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_hodgson partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_holland Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_holland partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 8 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_howard Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_howard partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_jorgensen Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_jorgensen partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 3 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_kim_theodore Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_kim_theodore partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 1 6234 16 1505 cascadelake, avx512, nogpu, 6234, pi, bigtmp pi_korenaga Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_korenaga partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 3 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp pi_lederman Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_lederman partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6254 36 1505 rtx4000,rtx8000,v100 4,2,2 8,48,16 cascadelake, avx512, 6254, pi, bigtmp, rtx8000 pi_levine Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=1952 Job Limits Jobs submitted to the pi_levine partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 20 8260 96 181 cascadelake, avx512, 8260, nogpu, pi pi_lora Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=3840 Job Limits Jobs submitted to the pi_lora partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 5 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 4 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, pi, bigtmp pi_mak Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_mak partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_manohar Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_manohar partition are subject to the following limits: Limit Value Maximum job time limit 180-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 4 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 8 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest 2 E7-4820_v4 40 1505 broadwell, E7-4820_v4, nogpu, pi, oldest 1 E5-2660_v4 28 245 p100 1 16 broadwell, E5-2660_v4, doubleprecision, pi, p100, oldest pi_ohern Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_ohern partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 8 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 2 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 9 6136 24 181 p100 4 16 skylake, avx512, 6136, doubleprecision, pi, p100 3 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_owen_miller Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_owen_miller partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 1 6234 16 1505 cascadelake, avx512, nogpu, 6234, pi, bigtmp 5 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_padmanabhan Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_padmanabhan partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 3 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_panda Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_panda partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 3 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 1 6326 32 1001 a100 4 40 cascadelake, avx512, 6326, doubleprecision, bigtmp, pi, a100-80g 1 6254 36 370 rtx2080ti 8 11 cascadelake, avx512, 6254, singleprecision, pi, bigtmp, rtx2080ti 2 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, pi, v100 pi_poland Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_poland partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 8 6240 36 370 cascadelake, avx512, 6240, nogpu, pi, bigtmp 9 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_polimanti Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_polimanti partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_seto Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_seto partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 3 6142 32 181 skylake, avx512, 6142, nogpu, standard, pi, bigtmp pi_spielman Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_spielman partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp pi_sweeney Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_sweeney partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 6240 36 180 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, pi, rtx3090 pi_tsmith Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_tsmith partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp pi_vaccaro Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_vaccaro partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp pi_zhu Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_zhu partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 12 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp Storage Grace has access to a number of filesystems. /vast/palmer hosts Grace's home and scratch directories and /gpfs/gibbs hosts project directories and most additional purchased storage allocations. For more details on the different storage spaces, see our Cluster Storage documentation. You can check your current storage usage & limits by running the getquota command. Your ~/project and ~/palmer_scratch directories are shortcuts. Get a list of the absolute paths to your directories with the mydirectories command. If you want to share data in your Project or Scratch directory, see the permissions page. For information on data recovery, see the Backups and Snapshots documentation. Warning Files stored in palmer_scratch are purged if they are older than 60 days. You will receive an email alert one week before they are deleted. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage. Partition Root Directory Storage File Count Backups Snapshots Notes home /vast/palmer/home.grace 125GiB/user 500,000 Yes >=2 days project /gpfs/gibbs/project 1TiB/group, increase to 4TiB on request 5,000,000 No >=2 days scratch /vast/palmer/scratch 10TiB/group 15,000,000 No No","title":"Grace"},{"location":"clusters/grace/#grace","text":"Grace is a shared-use resource for the Faculty of Arts and Sciences (FAS). It consists of a variety of compute nodes networked over low-latency InfiniBand and mounts several shared filesystems. The Grace cluster is is named for the computer scientist and United States Navy Rear Admiral Grace Murray Hopper , who received her Ph.D. in Mathematics from Yale in 1934. Operating System Upgrade During the August 2023 maintenance, the operating system on Grace was upgraded from Red Hat 7 to Red Hat 8. For more information, see our Grace Operating System Upgrade page.","title":"Grace"},{"location":"clusters/grace/#access-the-cluster","text":"Once you have an account , the cluster can be accessed via ssh or through the Open OnDemand web portal . Warning You are limited to 4 interactive app instances (of any type) at one time. Additional instances will be rejected until you delete older open instances. For OnDemand jobs, closing the window does not terminate the interactive app job. To terminate the job, click the \"Delete\" button in your \"My Interactive Apps\" page in the web portal.","title":"Access the Cluster"},{"location":"clusters/grace/#system-status-and-monitoring","text":"For system status messages and the schedule for upcoming maintenance, please see the system status page . For a current node-level view of job activity, see the cluster monitor page (VPN only) .","title":"System Status and Monitoring"},{"location":"clusters/grace/#partitions-and-hardware","text":"Grace is made up of several kinds of compute nodes. We group them into (sometimes overlapping) Slurm partitions meant to serve different purposes. By combining the --partition and --constraint Slurm options you can more finely control what nodes your jobs can run on. Job Submission Rate Limits Job submissions are limited to 200 jobs per hour . See the Rate Limits section in the Common Job Failures page for more info.","title":"Partitions and Hardware"},{"location":"clusters/grace/#public-partitions","text":"See each tab below for more information about the available common use partitions. day Use the day partition for most batch jobs. This is the default if you don't specify one with --partition . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the day partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per group 2500 Maximum CPUs per user 1000 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 72 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, common, bigtmp 97 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp devel Use the devel partition to jobs with which you need ongoing interaction. For example, exploratory analyses or debugging compilation. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the devel partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 4 Maximum memory per user 32G Maximum submitted jobs per user 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6126 24 174 skylake, avx512, 6126, nogpu, standard, common 4 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest week Use the week partition for jobs that need a longer runtime than day allows. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the week partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Maximum CPUs per group 252 Maximum CPUs per user 108 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 25 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp transfer Use the transfer partition to stage data for your jobs to and from cluster storage . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the transfer partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum running jobs per user 2 Maximum CPUs per job 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 7642 8 237 epyc, 7642, nogpu, standard, common gpu Use the gpu partition for jobs that make use of GPUs. You must request GPUs explicitly with the --gpus option in order to use them. For example, --gpus=gtx1080ti:2 would request 2 GeForce GTX 1080Ti GPUs per node. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Info Interactive jobs ( salloc or Open OnDemand) are not allowed in the gpu partition. Please submit those jobs to gpu_devel . GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the gpu partition are subject to the following limits: Limit Value Maximum job time limit 2-00:00:00 Maximum GPUs per user 24 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 12 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, common 5 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, common, bigtmp, rtx2080ti 2 6240 36 361 a100 4 40 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, a100 4 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, common, v100 2 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 2 6136 24 90 v100 2 16 skylake, avx512, 6136, doubleprecision, common, bigtmp, v100 gpu_devel Use the gpu_devel partition to debug jobs that make use of GPUs, or to develop GPU-enabled code. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the gpu_devel partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 10 Maximum GPUs per user 4 Maximum submitted jobs per user 2 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, common, bigtmp, rtx2080ti 1 6240 36 361 a100 4 40 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, a100 2 6240 36 166 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, rtx3090 1 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, common, v100 4 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 bigmem Use the bigmem partition for jobs that have memory requirements other partitions can't handle. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the bigmem partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 40 Maximum memory per user 4000G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 3 6240 36 1505 cascadelake, avx512, 6240, nogpu, common, bigtmp 4 6346 32 3936 cascadelake, avx512, 6346, common, nogpu, bigtmp 2 6234 16 1505 cascadelake, avx512, nogpu, 6234, common, bigtmp mpi Use the mpi partition for tightly-coupled parallel programs that make efficient use of multiple nodes. See our MPI documentation if your workload fits this description. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --exclusive --mem=92160 Job Limits Jobs submitted to the mpi partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum nodes per group 64 Maximum nodes per user 64 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 128 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, common, bigtmp scavenge Use the scavenge partition to run preemptable jobs on more resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the scavenge partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 10000 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 12 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, common 2 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 72 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, common, bigtmp 72 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 87 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 135 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp 5 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, common, bigtmp, rtx2080ti 2 6240 36 361 a100 4 40 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, a100 20 8260 96 181 cascadelake, avx512, 8260, nogpu, pi 2 6240 36 166 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, rtx3090 4 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 2 6240 36 180 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, pi, rtx3090 4 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, common, v100 3 6240 36 1505 cascadelake, avx512, 6240, nogpu, common, bigtmp 1 6326 32 1001 a100 4 40 cascadelake, avx512, 6326, doubleprecision, bigtmp, pi, a100-80g 4 6346 32 3936 cascadelake, avx512, 6346, common, nogpu, bigtmp 3 6234 16 1505 cascadelake, avx512, nogpu, 6234, pi, bigtmp 1 6254 36 370 rtx2080ti 8 11 cascadelake, avx512, 6254, singleprecision, pi, bigtmp, rtx2080ti 2 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, pi, v100 8 6240 36 370 cascadelake, avx512, 6240, nogpu, pi, bigtmp 2 6234 16 1505 cascadelake, avx512, nogpu, 6234, common, bigtmp 6 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 16 6136 24 90 edr, skylake, avx512, 6136, nogpu, standard, pi, bigtmp 3 6142 32 181 skylake, avx512, 6142, nogpu, standard, pi, bigtmp 16 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, pi, bigtmp 128 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, common, bigtmp 4 6136 24 90 hdr, skylake, avx512, 6136, nogpu, pi, common, bigtmp 2 6136 24 90 v100 2 16 skylake, avx512, 6136, doubleprecision, common, bigtmp, v100 9 6136 24 181 p100 4 16 skylake, avx512, 6136, doubleprecision, pi, p100 2 5122 8 181 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 1 6136 24 749 skylake, avx512, 6136, nogpu, pi, bigtmp 74 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest 2 E7-4820_v4 40 1505 broadwell, E7-4820_v4, nogpu, pi, oldest 1 E5-2637_v4 8 119 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, bigtmp, gtx1080ti, oldest 1 E5-2660_v4 28 245 p100 1 16 broadwell, E5-2660_v4, doubleprecision, pi, p100, oldest scavenge_gpu Use the scavenge_gpu partition to run preemptable jobs on more GPU resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the scavenge_gpu partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum GPUs per user 30 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 12 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, common 2 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 5 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, common, bigtmp, rtx2080ti 2 6240 36 361 a100 4 40 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, a100 2 6240 36 166 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, rtx3090 4 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 2 6240 36 180 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, pi, rtx3090 4 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, common, v100 1 6326 32 1001 a100 4 40 cascadelake, avx512, 6326, doubleprecision, bigtmp, pi, a100-80g 1 6254 36 370 rtx2080ti 8 11 cascadelake, avx512, 6254, singleprecision, pi, bigtmp, rtx2080ti 2 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, pi, v100 6 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 2 6136 24 90 v100 2 16 skylake, avx512, 6136, doubleprecision, common, bigtmp, v100 9 6136 24 181 p100 4 16 skylake, avx512, 6136, doubleprecision, pi, p100 2 5122 8 181 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 1 E5-2637_v4 8 119 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, bigtmp, gtx1080ti, oldest 1 E5-2660_v4 28 245 p100 1 16 broadwell, E5-2660_v4, doubleprecision, pi, p100, oldest scavenge_mpi Use the scavenge_mpi partition to run preemptable jobs on more MPI resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --exclusive --mem=92160 Job Limits Jobs submitted to the scavenge_mpi partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum nodes per group 64 Maximum nodes per user 64 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 128 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, common, bigtmp","title":"Public Partitions"},{"location":"clusters/grace/#private-partitions","text":"With few exceptions, jobs submitted to private partitions are not considered when calculating your group's Fairshare . Your group can purchase additional hardware for private use, which we will make available as a pi_groupname partition. These nodes are purchased by you, but supported and administered by us. After vendor support expires, we retire compute nodes. Compute nodes can range from $10K to upwards of $50K depending on your requirements. If you are interested in purchasing nodes for your group, please contact us . PI Partitions (click to expand) pi_anticevic Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_anticevic partition are subject to the following limits: Limit Value Maximum job time limit 100-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 20 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_balou Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_balou partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 14 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 9 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 26 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_berry Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_berry partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 1 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp pi_chem_chase Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=3840 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_chem_chase partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 8 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 1 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti pi_cowles Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_cowles partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Maximum CPUs per user 120 Maximum nodes per user 5 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 9 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp pi_econ_io Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_econ_io partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 6 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_econ_lp Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_econ_lp partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 7 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 5 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 1 6234 16 1505 cascadelake, avx512, nogpu, 6234, pi, bigtmp pi_esi Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_esi partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Maximum CPUs per user 648 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 36 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_fedorov Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=3840 Job Limits Jobs submitted to the pi_fedorov partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 12 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, pi, bigtmp 4 6136 24 90 hdr, skylake, avx512, 6136, nogpu, pi, common, bigtmp pi_gelernter Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_gelernter partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 1 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_hammes_schiffer Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=3840 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_hammes_schiffer partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 8 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 6 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 1 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 16 6136 24 90 edr, skylake, avx512, 6136, nogpu, standard, pi, bigtmp 2 5122 8 181 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 1 6136 24 749 skylake, avx512, 6136, nogpu, pi, bigtmp 1 E5-2637_v4 8 119 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, bigtmp, gtx1080ti, oldest pi_hodgson Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_hodgson partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_holland Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_holland partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 8 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_howard Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_howard partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_jorgensen Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_jorgensen partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 3 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_kim_theodore Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_kim_theodore partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 1 6234 16 1505 cascadelake, avx512, nogpu, 6234, pi, bigtmp pi_korenaga Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_korenaga partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 3 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp pi_lederman Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_lederman partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6254 36 1505 rtx4000,rtx8000,v100 4,2,2 8,48,16 cascadelake, avx512, 6254, pi, bigtmp, rtx8000 pi_levine Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=1952 Job Limits Jobs submitted to the pi_levine partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 20 8260 96 181 cascadelake, avx512, 8260, nogpu, pi pi_lora Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=3840 Job Limits Jobs submitted to the pi_lora partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 5 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 4 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, pi, bigtmp pi_mak Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_mak partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_manohar Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_manohar partition are subject to the following limits: Limit Value Maximum job time limit 180-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 4 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 8 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest 2 E7-4820_v4 40 1505 broadwell, E7-4820_v4, nogpu, pi, oldest 1 E5-2660_v4 28 245 p100 1 16 broadwell, E5-2660_v4, doubleprecision, pi, p100, oldest pi_ohern Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_ohern partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 8 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 2 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 9 6136 24 181 p100 4 16 skylake, avx512, 6136, doubleprecision, pi, p100 3 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_owen_miller Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_owen_miller partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 1 6234 16 1505 cascadelake, avx512, nogpu, 6234, pi, bigtmp 5 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_padmanabhan Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_padmanabhan partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 3 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_panda Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_panda partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 3 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 1 6326 32 1001 a100 4 40 cascadelake, avx512, 6326, doubleprecision, bigtmp, pi, a100-80g 1 6254 36 370 rtx2080ti 8 11 cascadelake, avx512, 6254, singleprecision, pi, bigtmp, rtx2080ti 2 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, pi, v100 pi_poland Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_poland partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 8 6240 36 370 cascadelake, avx512, 6240, nogpu, pi, bigtmp 9 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_polimanti Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_polimanti partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_seto Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_seto partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 3 6142 32 181 skylake, avx512, 6142, nogpu, standard, pi, bigtmp pi_spielman Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_spielman partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp pi_sweeney Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_sweeney partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 6240 36 180 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, pi, rtx3090 pi_tsmith Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_tsmith partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp pi_vaccaro Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_vaccaro partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp pi_zhu Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_zhu partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 12 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp","title":"Private Partitions"},{"location":"clusters/grace/#storage","text":"Grace has access to a number of filesystems. /vast/palmer hosts Grace's home and scratch directories and /gpfs/gibbs hosts project directories and most additional purchased storage allocations. For more details on the different storage spaces, see our Cluster Storage documentation. You can check your current storage usage & limits by running the getquota command. Your ~/project and ~/palmer_scratch directories are shortcuts. Get a list of the absolute paths to your directories with the mydirectories command. If you want to share data in your Project or Scratch directory, see the permissions page. For information on data recovery, see the Backups and Snapshots documentation. Warning Files stored in palmer_scratch are purged if they are older than 60 days. You will receive an email alert one week before they are deleted. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage. Partition Root Directory Storage File Count Backups Snapshots Notes home /vast/palmer/home.grace 125GiB/user 500,000 Yes >=2 days project /gpfs/gibbs/project 1TiB/group, increase to 4TiB on request 5,000,000 No >=2 days scratch /vast/palmer/scratch 10TiB/group 15,000,000 No No","title":"Storage"},{"location":"clusters/grace_rhel8/","text":"Grace Operating System Upgrade Grace's current operating system, Red Hat (RHEL) 7, will be offically end-of-life in 2024 and will no longer be supported with security patches by the developer. Therefore Grace has been upgraded to RHEL 8 during the August maintenance window, August 15-17, 2023. This provides a number of key benefits to Grace: consistency with the McCleary cluster continued security patches and support beyond 2023 updated system libraries to better support modern software improved node management system to facilitate the growing number of nodes on Grace shared application tree between McCleary and Grace, which brings software parity between the clusters* * some software and workflows will only be supported by YCRC staff on one of the cluster, e.g. tightly couple MPI codes (Grace) or RELION (McCleary). While we have done extensive testing both internally and with the new McCleary cluster, we recognize that there are a large number custom workflows on Grace that may need to be modified to work with the new operating system. To this end, we provided test partition ahead of the upgrade. Now that the upgrade has been rolled out cluster-wide, the test partitions (e.g. rhel8_day ) have been removed. All jobs should be submitted to the normal partitions, which now contain exclusively RHEL 8 nodes. New Host Key The ssh host key for Grace's login nodes were changed during the August maintenance, which will result in an error similar to the following when you attempt to login for the first time after the maintenance. @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! To access the cluster again, first remove the old host keys with the following command (if accessing the cluster via command line): ssh-keygen -R grace.hpc.yale.edu If you are using a GUI, such as MobaXterm, you will need to manually edit your known host file and remove the lines related to Grace. For MobaXterm, this file is located (by default) in Documents\\MobaXterm\\home\\.ssh . Then attempt a new login and accept the new host key. The valid host keys for the login nodes are as follows: 3072 SHA256:8jJ/dKJVntzBJQWW8pU901PHbWcIe2r8ACvq30zQxKU login1 (RSA) 256 SHA256:vhmGumY/XI/PAaheWQCadspl22/mqMiUiNXk+ov/zRc login1 (ECDSA) 256 SHA256:NWNrMNoLwcqMm+E2NpsKKmirSbku9iXgbfk8ucn5aZE login1 (ED25519) New Software Tree Grace now shares a software module tree with the McCleary cluster, providing a more consistent experience for all our users. Existing applications will continue to be available during this transition period. We plan to deprecate and remove the old application tree during the December 2023 maintenance window. If you experience any issues with software, please let us know at hpc@yale.edu and we can look into reinstalling. Common Errors Python not found Under RHEL8, we have only installed Python 3, which must be executed using python3 (not python ). As always, if you need additional packages, we strongly recommend setting up your own conda environment . In addition, Python 2.7 is no longer support and therefore not installed by default. To use Python 2.7, we request you setup a conda environment . Missing System Libraries Some of the existing applications may depend on libraries that are no longer installed in the operating system. If you run into these errors please email hpc@yale.edu and include which application/version you are using along with the full error message. We will investigate these on a case-by-case basis and work to get the issue resolved. There will be a small number of compute nodes reserved with RHEL7 (in a partition named legacy ) to enable work to continue while we resolve these issues. This partition will remain available until the December maintenance window. Warning Some of the applications in the new shared apps tree may not work perfectly on the legacy RHEL7 nodes. When running jobs in the legacy partition, you should therefore run module purge at the begining of interactive sessions and add it to the start of your batch scripts. This will ensure that you only load modules built for RHEL7. Report Issues If you continue to have or discover new issues with your workflow, feel free to contact us for assistance. Please include the working directory, the commands that were run, the software modules used, and any more information needed to reproduce the issue.","title":"Grace Operating System Upgrade"},{"location":"clusters/grace_rhel8/#grace-operating-system-upgrade","text":"Grace's current operating system, Red Hat (RHEL) 7, will be offically end-of-life in 2024 and will no longer be supported with security patches by the developer. Therefore Grace has been upgraded to RHEL 8 during the August maintenance window, August 15-17, 2023. This provides a number of key benefits to Grace: consistency with the McCleary cluster continued security patches and support beyond 2023 updated system libraries to better support modern software improved node management system to facilitate the growing number of nodes on Grace shared application tree between McCleary and Grace, which brings software parity between the clusters* * some software and workflows will only be supported by YCRC staff on one of the cluster, e.g. tightly couple MPI codes (Grace) or RELION (McCleary). While we have done extensive testing both internally and with the new McCleary cluster, we recognize that there are a large number custom workflows on Grace that may need to be modified to work with the new operating system. To this end, we provided test partition ahead of the upgrade. Now that the upgrade has been rolled out cluster-wide, the test partitions (e.g. rhel8_day ) have been removed. All jobs should be submitted to the normal partitions, which now contain exclusively RHEL 8 nodes.","title":"Grace Operating System Upgrade"},{"location":"clusters/grace_rhel8/#new-host-key","text":"The ssh host key for Grace's login nodes were changed during the August maintenance, which will result in an error similar to the following when you attempt to login for the first time after the maintenance. @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! To access the cluster again, first remove the old host keys with the following command (if accessing the cluster via command line): ssh-keygen -R grace.hpc.yale.edu If you are using a GUI, such as MobaXterm, you will need to manually edit your known host file and remove the lines related to Grace. For MobaXterm, this file is located (by default) in Documents\\MobaXterm\\home\\.ssh . Then attempt a new login and accept the new host key. The valid host keys for the login nodes are as follows: 3072 SHA256:8jJ/dKJVntzBJQWW8pU901PHbWcIe2r8ACvq30zQxKU login1 (RSA) 256 SHA256:vhmGumY/XI/PAaheWQCadspl22/mqMiUiNXk+ov/zRc login1 (ECDSA) 256 SHA256:NWNrMNoLwcqMm+E2NpsKKmirSbku9iXgbfk8ucn5aZE login1 (ED25519)","title":"New Host Key"},{"location":"clusters/grace_rhel8/#new-software-tree","text":"Grace now shares a software module tree with the McCleary cluster, providing a more consistent experience for all our users. Existing applications will continue to be available during this transition period. We plan to deprecate and remove the old application tree during the December 2023 maintenance window. If you experience any issues with software, please let us know at hpc@yale.edu and we can look into reinstalling.","title":"New Software Tree"},{"location":"clusters/grace_rhel8/#common-errors","text":"","title":"Common Errors"},{"location":"clusters/grace_rhel8/#python-not-found","text":"Under RHEL8, we have only installed Python 3, which must be executed using python3 (not python ). As always, if you need additional packages, we strongly recommend setting up your own conda environment . In addition, Python 2.7 is no longer support and therefore not installed by default. To use Python 2.7, we request you setup a conda environment .","title":"Python not found"},{"location":"clusters/grace_rhel8/#missing-system-libraries","text":"Some of the existing applications may depend on libraries that are no longer installed in the operating system. If you run into these errors please email hpc@yale.edu and include which application/version you are using along with the full error message. We will investigate these on a case-by-case basis and work to get the issue resolved. There will be a small number of compute nodes reserved with RHEL7 (in a partition named legacy ) to enable work to continue while we resolve these issues. This partition will remain available until the December maintenance window. Warning Some of the applications in the new shared apps tree may not work perfectly on the legacy RHEL7 nodes. When running jobs in the legacy partition, you should therefore run module purge at the begining of interactive sessions and add it to the start of your batch scripts. This will ensure that you only load modules built for RHEL7.","title":"Missing System Libraries"},{"location":"clusters/grace_rhel8/#report-issues","text":"If you continue to have or discover new issues with your workflow, feel free to contact us for assistance. Please include the working directory, the commands that were run, the software modules used, and any more information needed to reproduce the issue.","title":"Report Issues"},{"location":"clusters/maintenance/","text":"Cluster Maintenance Each YCRC cluster undergoes regular scheduled maintenance twice a year. During the maintenance, the cluster is unavailable, logins are deactivated and all pending jobs are held. Unless otherwise stated, the storage for that cluster will also be inaccessible during the maintenance. We use this opportunity when jobs are not running and there are no users on the machine to make upgrades and changes that would be disruptive. These activities include updating and patching the compute resources including the compute nodes, networking, service nodes and storage as well as making changes to critical infrastructure. Each maintenance is scheduled for three days, from Tuesday morning through end of day Thursday of the respective week. In many cases, the cluster may return to service early and, under extenuating circumstances, we may choose to extend maintenance if necessary to make sure the system is stable before restoring access and jobs. Communication will be sent to all users of the respective cluster both 4 weeks and 1 week prior to the maintenance period. Schedule The schedule for the regular cluster maintenance is posted below. Please be mindful of these dates and schedule your work accordingly to avoid disruptions. Date Cluster Dec 5-7 2023 Grace Feb 6-8 2024 Milgram Apr 2-4 2024 McCleary Jun 4-6 2024 Grace Aug 20-22 2024 Milgram Oct 1-3 2024 McCleary Dec 3-5 2024 Grace Occasionally we will schedule additional maintenance periods beyond those listed above, and potentially with shorter notices, if urgent work arises, such as power work on the data center or critical upgrades for stability or security. We will give as much notice as possible in advance of these maintenance outages.","title":"Cluster Maintenance"},{"location":"clusters/maintenance/#cluster-maintenance","text":"Each YCRC cluster undergoes regular scheduled maintenance twice a year. During the maintenance, the cluster is unavailable, logins are deactivated and all pending jobs are held. Unless otherwise stated, the storage for that cluster will also be inaccessible during the maintenance. We use this opportunity when jobs are not running and there are no users on the machine to make upgrades and changes that would be disruptive. These activities include updating and patching the compute resources including the compute nodes, networking, service nodes and storage as well as making changes to critical infrastructure. Each maintenance is scheduled for three days, from Tuesday morning through end of day Thursday of the respective week. In many cases, the cluster may return to service early and, under extenuating circumstances, we may choose to extend maintenance if necessary to make sure the system is stable before restoring access and jobs. Communication will be sent to all users of the respective cluster both 4 weeks and 1 week prior to the maintenance period.","title":"Cluster Maintenance"},{"location":"clusters/maintenance/#schedule","text":"The schedule for the regular cluster maintenance is posted below. Please be mindful of these dates and schedule your work accordingly to avoid disruptions. Date Cluster Dec 5-7 2023 Grace Feb 6-8 2024 Milgram Apr 2-4 2024 McCleary Jun 4-6 2024 Grace Aug 20-22 2024 Milgram Oct 1-3 2024 McCleary Dec 3-5 2024 Grace Occasionally we will schedule additional maintenance periods beyond those listed above, and potentially with shorter notices, if urgent work arises, such as power work on the data center or critical upgrades for stability or security. We will give as much notice as possible in advance of these maintenance outages.","title":"Schedule"},{"location":"clusters/mccleary-farnam-ruddle/","text":"McCleary for Farnam and Ruddle Users McCleary is the successor to both the Farnam and Ruddle clusters, which were retired in summer 2023. Key Dates Farnam April: Migration of purchased nodes and storage from Farnam to McCleary June 1st: Access to Farnam login and OnDemand nodes disabled Compute service charges on McCleary commons partitions begin July 13: /gpfs/ysm no longer be available Ruddle April: Migration of purchased nodes from Ruddle to McCleary June 1st: Official Farnam retirement date, and beginning of compute service charges on McCleary commons partitions. Jobs in the ycga partitions will always be exempt from compute service charge. July 24th: Access to Ruddle login and OnDemand nodes disabled. Old /gpfs/ycga replaced with new system. Accounts Most Farnam and Ruddle users who have been active in the last year have accounts automatically created on McCleary for them and have received an email to that effect. All other users who conduct life sciences research can request an account using our Account Request form . Group Membership Check which group your new McCleary account is associated with and make sure that matches your expection. This is the group that will be charged (if/when applicable) for your compute usage as well as dictate which private partitions you may have access to. Any cluster specific changes previously made on Farnam or Ruddle will not be automatically reflected on McCleary. To check, run the following command (replacing with your netid): sacctmgr show user If you need your group association changed, please let us know at hpc@yale.edu . Access Hostname McCleary can be accessed via SSH (or MobaXterm) at the hostname mccleary.ycrc.yale.edu . Transfers and transfer applications should be connected via transfer-mccleary.ycrc.yale.edu . Note The hostname does not use the domain hpc.yale.edu, but uses ycrc .yale.edu instead. Multifactor authentication via Duo is required for all users on McCleary, similar to how Ruddle is currently configured. This will be new to Farnam users. For most usage this additional step is minimally invasive and makes our clusters much more secure. However, for users who use graphical transfer tools such as Cyberduck, please see our MFA transfer documentation . Web Portal (Open OnDemand) McCleary web portal url is available at ood-mccleary.ycrc.yale.edu . On McCleary, you are limited to 4 interactive app instances (of any type) through the web portal at one time. Additional instances will remain pending in the queue until you terminate older open instances. Closing the window does not terminate the interactive app job. To terminate the job, click the \"Delete\" button in your \"My Interactive Apps\" page in the web portal. Note Again, the url does not use the domain hpc.yale.edu, but uses ycrc .yale.edu instead. Software We have installed most commonly used software modules from Farnam and Ruddle onto McCleary. Usage of modules on McCleary is similar to the other clusters (e.g. module avail , module load ). Some software may only be initially available in a newer version than was installed on Farnam or Ruddle. If you cannot find a software package on McCleary that you need, please let us know at hpc@yale.edu and we can look into installing it for you. Partition and Job Scheduler The most significant changes on transitioning from Farnam or Ruddle to McCleary is in respect to the partition scheme. McCleary uses the partition scheme used on the Grace and Milgram clusters, so should be familiar to users of those clusters. A full list of McCleary partitions can be found on the cluster page . Default Time Request The default walltime on McCleary is 1 hour on all partitions, down from 24 hours on Farnam and Ruddle. Use the -t flag to request a longer time limit. Changes to Partitions Below are notable changes to the partitions relative to Farnam and Ruddle. Many of these changes are reductions to maximum time request. If you job cannot run in the available partition time limits, please contact us at hpc@yale.edu so we can discuss your situation. general McCleary does not have a general partition, but instead has day and week partitions with maximum time limits of 24 hours and 7 days, respectively. The week partition contains significantly fewer nodes than day and will reject any job that request less than 24 hours of walltime, so please think carefully about how long your job needs to run for when selecting a partition. We strongly encourage checkpointing if it is an option or dividing up your workload into less than 24 hour chunks. This scheme promotes high turnover of compute resources and reduces the number of idle jobs, resulting in lower overall wait time. Interactive jobs are blocked from running in the day or week partitions. See the interactive partition below instead. day is the default partition for batch jobs (where your job goes if you do not specify a partition with -p or --partition ). interactive The interactive partition is called devel and contains a set of dedicated nodes specifically for development or interactive uses ( salloc jobs). To ensure high availability of resources, users are limited to one job at time. That job cannot request more than 6 hours, 4 cpus and 32G of memory. devel is the default partition for jobs started using salloc (where your job goes if you do not specify a partition with -p or --partition ). bigmem McCleary has a bigmem partition, but the maximum time request is now 24 hours. Jobs requesting less than 120G of RAM will be rejected from the partition and we ask you to submit those jobs to day . scavenge McCleary has a scavenge partition that operates in the same preemptable mode as before, but the maximum time request is now 24 hours. gpu_devel There is no gpu_devel on McCleary. We are evaluating the needs and potential solutions for interactive GPU-enabled jobs. For now, interactive GPU-enabled jobs should be submitted to the gpu partition. YCGA Compute YCGA researchers have access to a dedicated set of nodes totally over 3000 cores on McCleary that are prefixed with ycga . ycga : general purpose partition for batch jobs ycga_interactive : partition for interactive jobs (limit of 1 job at a time in this partition) ycga_bigmem : for jobs requiring large amount of RAM (>120G) Dedicated Nodes If you have purchased nodes on Farnam or Ruddle that are not in the haswell generation, we have coordinated with your group to migrate those nodes to McCleary in April into a partition of the same name. Storage and Data If you have data on the Gibbs filesystem, there was no action required as they are already available on McCleary. Farnam Data Farnam\u2019s primary filesystem, YSM (/gpfs/ysm), was retired on July 13th. If you previously had a Farnam account, you have been give new, empty home and scratch directories for McCleary on our Palmer filesystem and a 1 TiB project space on our Gibbs filesystem. Project quotas can be increased to 4 TiB at no cost by sending a request to hpc@yale.edu . Ruddle Data The YCGA storage system ( /gpfs/ycga ) has been replaced with a new, larger storage system at the same namespace. All data in the project (now at work ), sequencers , special , and pi directories under /gpfs/ycga were migrated by YCRC staff to the new storage system. All other data on /gpfs/ycga (Ruddle home and scratch60) was retired with Ruddle on July 24th. As a McCleary user, you have also been given new, empty home and scratch directories for McCleary on our Palmer filesystem and a 1 TiB project space on our Gibbs filesystem. Project quotas can be increased to 4 TiB at no cost by sending a request to hpc@yale.edu . Ruddle Project Data Data previously in /gpfs/ycga/project// can now be found at /gpfs/ycga/work// . The project symlink in your home directory links to your Gibbs project space, not your YCGA storage. Researchers with Purchased Storage If you have purchased space on /gpfs/ycga or /gpfs/ysm that has not expired, we have migrated your allocation. This is the only data that the YCRC automatically migrated from Farnam to McCleary. If you have purchased storage on /gpfs/ysm that has expired as of December 31st 2022, you should have received a separate communication from us with information on purchasing replacement storage on Gibbs (which is available on McCleary). If you have any questions or concerns about what has been moved to McCleary and when, please reach out to us. Storage@Yale (SAY) Shares Storage@Yale shares are available on McCleary, but only on the transfer node. To access your SAY data, make sure to login to the transfer node and then copy your data to either project or scratch . Note, this is different than how Ruddle was set up, where SAY shares were available on all nodes.","title":"McCleary for Farnam and Ruddle Users"},{"location":"clusters/mccleary-farnam-ruddle/#mccleary-for-farnam-and-ruddle-users","text":"McCleary is the successor to both the Farnam and Ruddle clusters, which were retired in summer 2023.","title":"McCleary for Farnam and Ruddle Users"},{"location":"clusters/mccleary-farnam-ruddle/#key-dates","text":"","title":"Key Dates"},{"location":"clusters/mccleary-farnam-ruddle/#farnam","text":"April: Migration of purchased nodes and storage from Farnam to McCleary June 1st: Access to Farnam login and OnDemand nodes disabled Compute service charges on McCleary commons partitions begin July 13: /gpfs/ysm no longer be available","title":"Farnam"},{"location":"clusters/mccleary-farnam-ruddle/#ruddle","text":"April: Migration of purchased nodes from Ruddle to McCleary June 1st: Official Farnam retirement date, and beginning of compute service charges on McCleary commons partitions. Jobs in the ycga partitions will always be exempt from compute service charge. July 24th: Access to Ruddle login and OnDemand nodes disabled. Old /gpfs/ycga replaced with new system.","title":"Ruddle"},{"location":"clusters/mccleary-farnam-ruddle/#accounts","text":"Most Farnam and Ruddle users who have been active in the last year have accounts automatically created on McCleary for them and have received an email to that effect. All other users who conduct life sciences research can request an account using our Account Request form . Group Membership Check which group your new McCleary account is associated with and make sure that matches your expection. This is the group that will be charged (if/when applicable) for your compute usage as well as dictate which private partitions you may have access to. Any cluster specific changes previously made on Farnam or Ruddle will not be automatically reflected on McCleary. To check, run the following command (replacing with your netid): sacctmgr show user If you need your group association changed, please let us know at hpc@yale.edu .","title":"Accounts"},{"location":"clusters/mccleary-farnam-ruddle/#access","text":"","title":"Access"},{"location":"clusters/mccleary-farnam-ruddle/#hostname","text":"McCleary can be accessed via SSH (or MobaXterm) at the hostname mccleary.ycrc.yale.edu . Transfers and transfer applications should be connected via transfer-mccleary.ycrc.yale.edu . Note The hostname does not use the domain hpc.yale.edu, but uses ycrc .yale.edu instead. Multifactor authentication via Duo is required for all users on McCleary, similar to how Ruddle is currently configured. This will be new to Farnam users. For most usage this additional step is minimally invasive and makes our clusters much more secure. However, for users who use graphical transfer tools such as Cyberduck, please see our MFA transfer documentation .","title":"Hostname"},{"location":"clusters/mccleary-farnam-ruddle/#web-portal-open-ondemand","text":"McCleary web portal url is available at ood-mccleary.ycrc.yale.edu . On McCleary, you are limited to 4 interactive app instances (of any type) through the web portal at one time. Additional instances will remain pending in the queue until you terminate older open instances. Closing the window does not terminate the interactive app job. To terminate the job, click the \"Delete\" button in your \"My Interactive Apps\" page in the web portal. Note Again, the url does not use the domain hpc.yale.edu, but uses ycrc .yale.edu instead.","title":"Web Portal (Open OnDemand)"},{"location":"clusters/mccleary-farnam-ruddle/#software","text":"We have installed most commonly used software modules from Farnam and Ruddle onto McCleary. Usage of modules on McCleary is similar to the other clusters (e.g. module avail , module load ). Some software may only be initially available in a newer version than was installed on Farnam or Ruddle. If you cannot find a software package on McCleary that you need, please let us know at hpc@yale.edu and we can look into installing it for you.","title":"Software"},{"location":"clusters/mccleary-farnam-ruddle/#partition-and-job-scheduler","text":"The most significant changes on transitioning from Farnam or Ruddle to McCleary is in respect to the partition scheme. McCleary uses the partition scheme used on the Grace and Milgram clusters, so should be familiar to users of those clusters. A full list of McCleary partitions can be found on the cluster page .","title":"Partition and Job Scheduler"},{"location":"clusters/mccleary-farnam-ruddle/#default-time-request","text":"The default walltime on McCleary is 1 hour on all partitions, down from 24 hours on Farnam and Ruddle. Use the -t flag to request a longer time limit.","title":"Default Time Request"},{"location":"clusters/mccleary-farnam-ruddle/#changes-to-partitions","text":"Below are notable changes to the partitions relative to Farnam and Ruddle. Many of these changes are reductions to maximum time request. If you job cannot run in the available partition time limits, please contact us at hpc@yale.edu so we can discuss your situation.","title":"Changes to Partitions"},{"location":"clusters/mccleary-farnam-ruddle/#general","text":"McCleary does not have a general partition, but instead has day and week partitions with maximum time limits of 24 hours and 7 days, respectively. The week partition contains significantly fewer nodes than day and will reject any job that request less than 24 hours of walltime, so please think carefully about how long your job needs to run for when selecting a partition. We strongly encourage checkpointing if it is an option or dividing up your workload into less than 24 hour chunks. This scheme promotes high turnover of compute resources and reduces the number of idle jobs, resulting in lower overall wait time. Interactive jobs are blocked from running in the day or week partitions. See the interactive partition below instead. day is the default partition for batch jobs (where your job goes if you do not specify a partition with -p or --partition ).","title":"general"},{"location":"clusters/mccleary-farnam-ruddle/#interactive","text":"The interactive partition is called devel and contains a set of dedicated nodes specifically for development or interactive uses ( salloc jobs). To ensure high availability of resources, users are limited to one job at time. That job cannot request more than 6 hours, 4 cpus and 32G of memory. devel is the default partition for jobs started using salloc (where your job goes if you do not specify a partition with -p or --partition ).","title":"interactive"},{"location":"clusters/mccleary-farnam-ruddle/#bigmem","text":"McCleary has a bigmem partition, but the maximum time request is now 24 hours. Jobs requesting less than 120G of RAM will be rejected from the partition and we ask you to submit those jobs to day .","title":"bigmem"},{"location":"clusters/mccleary-farnam-ruddle/#scavenge","text":"McCleary has a scavenge partition that operates in the same preemptable mode as before, but the maximum time request is now 24 hours.","title":"scavenge"},{"location":"clusters/mccleary-farnam-ruddle/#gpu_devel","text":"There is no gpu_devel on McCleary. We are evaluating the needs and potential solutions for interactive GPU-enabled jobs. For now, interactive GPU-enabled jobs should be submitted to the gpu partition.","title":"gpu_devel"},{"location":"clusters/mccleary-farnam-ruddle/#ycga-compute","text":"YCGA researchers have access to a dedicated set of nodes totally over 3000 cores on McCleary that are prefixed with ycga . ycga : general purpose partition for batch jobs ycga_interactive : partition for interactive jobs (limit of 1 job at a time in this partition) ycga_bigmem : for jobs requiring large amount of RAM (>120G)","title":"YCGA Compute"},{"location":"clusters/mccleary-farnam-ruddle/#dedicated-nodes","text":"If you have purchased nodes on Farnam or Ruddle that are not in the haswell generation, we have coordinated with your group to migrate those nodes to McCleary in April into a partition of the same name.","title":"Dedicated Nodes"},{"location":"clusters/mccleary-farnam-ruddle/#storage-and-data","text":"If you have data on the Gibbs filesystem, there was no action required as they are already available on McCleary.","title":"Storage and Data"},{"location":"clusters/mccleary-farnam-ruddle/#farnam-data","text":"Farnam\u2019s primary filesystem, YSM (/gpfs/ysm), was retired on July 13th. If you previously had a Farnam account, you have been give new, empty home and scratch directories for McCleary on our Palmer filesystem and a 1 TiB project space on our Gibbs filesystem. Project quotas can be increased to 4 TiB at no cost by sending a request to hpc@yale.edu .","title":"Farnam Data"},{"location":"clusters/mccleary-farnam-ruddle/#ruddle-data","text":"The YCGA storage system ( /gpfs/ycga ) has been replaced with a new, larger storage system at the same namespace. All data in the project (now at work ), sequencers , special , and pi directories under /gpfs/ycga were migrated by YCRC staff to the new storage system. All other data on /gpfs/ycga (Ruddle home and scratch60) was retired with Ruddle on July 24th. As a McCleary user, you have also been given new, empty home and scratch directories for McCleary on our Palmer filesystem and a 1 TiB project space on our Gibbs filesystem. Project quotas can be increased to 4 TiB at no cost by sending a request to hpc@yale.edu . Ruddle Project Data Data previously in /gpfs/ycga/project// can now be found at /gpfs/ycga/work// . The project symlink in your home directory links to your Gibbs project space, not your YCGA storage.","title":"Ruddle Data"},{"location":"clusters/mccleary-farnam-ruddle/#researchers-with-purchased-storage","text":"If you have purchased space on /gpfs/ycga or /gpfs/ysm that has not expired, we have migrated your allocation. This is the only data that the YCRC automatically migrated from Farnam to McCleary. If you have purchased storage on /gpfs/ysm that has expired as of December 31st 2022, you should have received a separate communication from us with information on purchasing replacement storage on Gibbs (which is available on McCleary). If you have any questions or concerns about what has been moved to McCleary and when, please reach out to us.","title":"Researchers with Purchased Storage"},{"location":"clusters/mccleary-farnam-ruddle/#storageyale-say-shares","text":"Storage@Yale shares are available on McCleary, but only on the transfer node. To access your SAY data, make sure to login to the transfer node and then copy your data to either project or scratch . Note, this is different than how Ruddle was set up, where SAY shares were available on all nodes.","title":"Storage@Yale (SAY) Shares"},{"location":"clusters/mccleary/","text":"McCleary McCleary is a shared-use resource for the Yale School of Medicine (YSM), life science researchers elsewhere on campus and projects related to the Yale Center for Genome Analysis . It consists of a variety of compute nodes networked over ethernet and mounts several shared filesystems. McCleary is named for Beatrix McCleary Hamburg , who received her medical degree in 1948 and was the first female African American graduate of Yale School of Medicine. The McCleary HPC cluster is Yale's first direct-to-chip liquid cooled cluster, moving the YCRC and the Yale research computing community into a more environmentally friendly future. Info Farnam or Ruddle user? Farnam and Ruddle were both retired in summer 2023. See our explainer for what you need to know about using McCleary and how it differs from Farnam and Ruddle. Access the Cluster Once you have an account , the cluster can be accessed via ssh or through the Open OnDemand web portal . Warning You are limited to 4 interactive app instances (of any type) at one time. Additional instances will be rejected until you delete older open instances. For OnDemand jobs, closing the window does not terminate the interactive app job. To terminate the job, click the \"Delete\" button in your \"My Interactive Apps\" page in the web portal. System Status and Monitoring For system status messages and the schedule for upcoming maintenance, please see the system status page . For a current node-level view of job activity, see the cluster monitor page (VPN only) . Partitions and Hardware McCleary is made up of several kinds of compute nodes. We group them into (sometimes overlapping) Slurm partitions meant to serve different purposes. By combining the --partition and --constraint Slurm options you can more finely control what nodes your jobs can run on. Info YCGA sequence data user? To avoid being charged for your cpu usage for YCGA-related work, make sure to submit jobs to the ycga partition with -p ycga. Job Submission Rate Limits Job submissions are limited to 200 jobs per hour . See the Rate Limits section in the Common Job Failures page for more info. Public Partitions See each tab below for more information about the available common use partitions. day Use the day partition for most batch jobs. This is the default if you don't specify one with --partition . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the day partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per group 512 Maximum memory per group 6000G Maximum CPUs per user 256 Maximum memory per user 3000G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 26 8358 64 983 icelake, avx512, 8358, nogpu, bigtmp, common 15 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, common devel Use the devel partition to jobs with which you need ongoing interaction. For example, exploratory analyses or debugging compilation. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the devel partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 4 Maximum memory per user 32G Maximum submitted jobs per user 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 10 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, common week Use the week partition for jobs that need a longer runtime than day allows. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the week partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Maximum CPUs per group 192 Maximum memory per group 2949G Maximum CPUs per user 192 Maximum memory per user 2949G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 14 8358 64 983 icelake, avx512, 8358, nogpu, bigtmp, common 2 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, common long Use the long partition for jobs that need a longer runtime than week allows. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=7-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the long partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Maximum CPUs per group 36 Maximum CPUs per user 36 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, common transfer Use the transfer partition to stage data for your jobs to and from cluster storage . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the transfer partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 1 Maximum running jobs per user 2 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 72 8 227 milan, 72F3, nogpu, standard, common gpu Use the gpu partition for jobs that make use of GPUs. You must request GPUs explicitly with the --gpus option in order to use them. For example, --gpus=gtx1080ti:2 would request 2 GeForce GTX 1080Ti GPUs per node. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the gpu partition are subject to the following limits: Limit Value Maximum job time limit 2-00:00:00 Maximum GPUs per group 24 Maximum GPUs per user 12 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 14 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, common, a100-80g 1 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 6 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 3 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 1 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, common, bigtmp, a100 9 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, oldest, common, gtx1080ti gpu_devel Use the gpu_devel partition to debug jobs that make use of GPUs, or to develop GPU-enabled code. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the gpu_devel partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 4 Maximum memory per user 32G Maximum submitted jobs per user 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 E5-2623_v4 8 38 gtx1080ti 4 11 broadwell, E5-2623_v4, singleprecision, common, gtx1080ti bigmem Use the bigmem partition for jobs that have memory requirements other partitions can't handle. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the bigmem partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 32 Maximum memory per user 3960G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 4 6346 32 3960 icelake, avx512, 6346, nogpu, bigtmp, common 2 6234 16 1486 cascadelake, avx512, 6234, nogpu, common, bigtmp 3 6240 36 1486 cascadelake, avx512, 6240, nogpu, pi, bigtmp scavenge Use the scavenge partition to run preemptable jobs on more resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the scavenge partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 1000 Maximum memory per user 20000G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 48 8362 64 479 icelake, avx512, 8362, nogpu, standard, pi 20 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, pi, a100 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, common, a100-80g 4 6346 32 1991 icelake, avx512, 6346, nogpu, pi 4 6326 32 479 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 4 6346 32 3960 icelake, avx512, 6346, nogpu, bigtmp, common 40 8358 64 983 icelake, avx512, 8358, nogpu, bigtmp, common 4 6240 36 730 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 42 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 4 6240 36 352 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 9 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 2 6240 36 167 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 19 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, common 1 6242 32 981 rtx8000 2 48 cascadelake, avx512, 6242, doubleprecision, pi, bigtmp, rtx8000 1 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 6 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 10 8268 48 352 cascadelake, avx512, 8268, nogpu, bigtmp, pi 2 6240 36 163 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 1 6248r 48 352 cascadelake, avx512, 6248r, nogpu, pi, bigtmp 2 6234 16 1486 cascadelake, avx512, 6234, nogpu, common, bigtmp 1 6240 36 352 v100 4 16 cascadelake, avx512, 6240, pi, v100 3 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 1 6240 36 352 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 1 6420 36 730 a100 4 40 cascadelake, avx512, 6420, doubleprecision, pi, bigtmp, a100 1 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, common, bigtmp, a100 1 6240 36 163 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 6 6240 36 1486 cascadelake, avx512, 6240, nogpu, pi, bigtmp 1 6226r 32 163 rtx3090 4 24 cascadelake, avx512, 6226r, doubleprecision, pi, rtx3090 8 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, pi, bigtmp, rtx5000 2 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, a100 1 6240 36 163 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 2 6132 28 163 skylake, avx512, 6132, nogpu, standard, bigtmp, pi 1 6132 28 730 skylake, avx512, 6132, nogpu, standard, bigtmp, pi 2 5122 8 163 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 39 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi 1 E7-4820_v4 40 1486 broadwell, E7-4820_v4, nogpu, pi 9 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, oldest, common, gtx1080ti 3 E5-2660_v4 28 227 p100 2 16 broadwell, E5-2660_v4, doubleprecision, pi, p100 1 E5-2637_v4 8 101 titanv 4 12 broadwell, E5-2637_v4, doubleprecision, pi, bigtmp, titanv 11 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, gtx1080ti scavenge_gpu Use the scavenge_gpu partition to run preemptable jobs on more GPU resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the scavenge_gpu partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum GPUs per group 100 Maximum GPUs per user 64 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 20 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, pi, a100 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, common, a100-80g 4 6326 32 479 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 1 6242 32 981 rtx8000 2 48 cascadelake, avx512, 6242, doubleprecision, pi, bigtmp, rtx8000 1 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 6 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 2 6240 36 163 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 1 6240 36 352 v100 4 16 cascadelake, avx512, 6240, pi, v100 3 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 1 6240 36 352 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 1 6420 36 730 a100 4 40 cascadelake, avx512, 6420, doubleprecision, pi, bigtmp, a100 1 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, common, bigtmp, a100 1 6240 36 163 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 1 6226r 32 163 rtx3090 4 24 cascadelake, avx512, 6226r, doubleprecision, pi, rtx3090 8 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, pi, bigtmp, rtx5000 2 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, a100 1 6240 36 163 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 2 5122 8 163 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 9 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, oldest, common, gtx1080ti 3 E5-2660_v4 28 227 p100 2 16 broadwell, E5-2660_v4, doubleprecision, pi, p100 1 E5-2637_v4 8 101 titanv 4 12 broadwell, E5-2637_v4, doubleprecision, pi, bigtmp, titanv 11 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, gtx1080ti Private Partitions With few exceptions, jobs submitted to private partitions are not considered when calculating your group's Fairshare . Your group can purchase additional hardware for private use, which we will make available as a pi_groupname partition. These nodes are purchased by you, but supported and administered by us. After vendor support expires, we retire compute nodes. Compute nodes can range from $10K to upwards of $50K depending on your requirements. If you are interested in purchasing nodes for your group, please contact us . PI Partitions (click to expand) pi_breaker Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_breaker partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 23 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi pi_bunick Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_bunick partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, a100 pi_butterwick Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_butterwick partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, a100 pi_chenlab Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_chenlab partition are subject to the following limits: Limit Value Maximum job time limit 14-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 352 cascadelake, avx512, 8268, nogpu, bigtmp, pi pi_cryo_realtime Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_cryo_realtime partition are subject to the following limits: Limit Value Maximum job time limit 14-00:00:00 Maximum GPUs per user 12 Maximum running jobs per user 2 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, gtx1080ti pi_cryoem Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_cryoem partition are subject to the following limits: Limit Value Maximum job time limit 4-00:00:00 Maximum CPUs per user 32 Maximum GPUs per user 12 Maximum running jobs per user 2 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 6 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000 9 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, gtx1080ti pi_deng Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_deng partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 E5-2660_v4 28 227 p100 2 16 broadwell, E5-2660_v4, doubleprecision, pi, p100 pi_dewan Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_dewan partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_dijk Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_dijk partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 352 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 pi_dunn Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_dunn partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_edwards Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_edwards partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_falcone Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_falcone partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 1 6240 36 352 v100 4 16 cascadelake, avx512, 6240, pi, v100 1 6240 36 1486 cascadelake, avx512, 6240, nogpu, pi, bigtmp pi_galvani Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_galvani partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 7 8268 48 352 cascadelake, avx512, 8268, nogpu, bigtmp, pi pi_gerstein Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_gerstein partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6132 28 163 skylake, avx512, 6132, nogpu, standard, bigtmp, pi 1 6132 28 730 skylake, avx512, 6132, nogpu, standard, bigtmp, pi 11 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi 1 E7-4820_v4 40 1486 broadwell, E7-4820_v4, nogpu, pi pi_gerstein_gpu Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_gerstein_gpu partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, pi, a100 1 6240 36 163 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 1 6240 36 163 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 2 E5-2660_v4 28 227 p100 2 16 broadwell, E5-2660_v4, doubleprecision, pi, p100 1 E5-2637_v4 8 101 titanv 4 12 broadwell, E5-2637_v4, doubleprecision, pi, bigtmp, titanv pi_gruen Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_gruen partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi pi_hall Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_hall partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 40 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_hall_bigmem Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_hall_bigmem partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 1486 cascadelake, avx512, 6240, nogpu, pi, bigtmp pi_jadi Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_jadi partition are subject to the following limits: Limit Value Maximum job time limit 365-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi pi_jetz Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_jetz partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 8358 64 1991 icelake, avx512, 8358, nogpu, bigtmp, pi 4 6240 36 730 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 4 6240 36 352 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_kleinstein Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_kleinstein partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_krishnaswamy Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_krishnaswamy partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6420 36 730 a100 4 40 cascadelake, avx512, 6420, doubleprecision, pi, bigtmp, a100 pi_ma Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_ma partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 352 cascadelake, avx512, 8268, nogpu, bigtmp, pi pi_medzhitov Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_medzhitov partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 167 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_miranker Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_miranker partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6248r 48 352 cascadelake, avx512, 6248r, nogpu, pi, bigtmp pi_ohern Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_ohern partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi pi_reinisch Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_reinisch partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 5122 8 163 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 pi_sestan Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_sestan partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 8358 64 1991 icelake, avx512, 8358, nogpu, bigtmp, pi pi_sigworth Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_sigworth partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 163 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti pi_sindelar Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_sindelar partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 163 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 1 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, gtx1080ti pi_tomography Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_tomography partition are subject to the following limits: Limit Value Maximum job time limit 4-00:00:00 Maximum CPUs per user 32 Maximum GPUs per user 24 Maximum running jobs per user 2 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6242 32 981 rtx8000 2 48 cascadelake, avx512, 6242, doubleprecision, pi, bigtmp, rtx8000 8 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, pi, bigtmp, rtx5000 pi_townsend Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_townsend partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_tsang Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_tsang partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 4 8358 64 983 icelake, avx512, 8358, nogpu, bigtmp, pi pi_ya-chi_ho Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_ya-chi_ho partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 352 cascadelake, avx512, 8268, nogpu, bigtmp, pi pi_yong_xiong Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_yong_xiong partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 4 6326 32 479 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 1 6226r 32 163 rtx3090 4 24 cascadelake, avx512, 6226r, doubleprecision, pi, rtx3090 pi_zhao Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_zhao partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi YCGA Partitions The following partitions are intended for projects related to the Yale Center for Genome Analysis . Please do not use these partitions for other proejcts. Access is granted on a group basis. If you need access to these partitions, please contact us to get approved and added. YCGA Partitions (click to expand) ycga Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the ycga partition are subject to the following limits: Limit Value Maximum job time limit 2-00:00:00 Maximum CPUs per group 512 Maximum memory per group 3934G Maximum CPUs per user 256 Maximum memory per user 1916G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 40 8362 64 479 icelake, avx512, 8362, nogpu, standard, pi ycga_admins Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 8362 64 479 icelake, avx512, 8362, nogpu, standard, pi ycga_bigmem Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the ycga_bigmem partition are subject to the following limits: Limit Value Maximum job time limit 4-00:00:00 Maximum CPUs per user 64 Maximum memory per user 1991G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 4 6346 32 1991 icelake, avx512, 6346, nogpu, pi ycga_long Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the ycga_long partition are subject to the following limits: Limit Value Maximum job time limit 14-00:00:00 Maximum CPUs per group 64 Maximum memory per group 479G Maximum CPUs per user 32 Maximum memory per user 239G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 6 8362 64 479 icelake, avx512, 8362, nogpu, standard, pi Public Datasets We host datasets of general interest in a loosely organized directory tree in /gpfs/gibbs/data : \u251c\u2500\u2500 alphafold-2.3 \u251c\u2500\u2500 alphafold-2.2 (deprecated) \u251c\u2500\u2500 alphafold-2.0 (deprecated) \u251c\u2500\u2500 annovar \u2502 \u2514\u2500\u2500 humandb \u251c\u2500\u2500 cryoem \u251c\u2500\u2500 db \u2502 \u251c\u2500\u2500 annovar \u2502 \u251c\u2500\u2500 blast \u2502 \u251c\u2500\u2500 busco \u2502 \u2514\u2500\u2500 Pfam \u2514\u2500\u2500 genomes \u251c\u2500\u2500 1000Genomes \u251c\u2500\u2500 10xgenomics \u251c\u2500\u2500 Aedes_aegypti \u251c\u2500\u2500 Bos_taurus \u251c\u2500\u2500 Chelonoidis_nigra \u251c\u2500\u2500 Danio_rerio \u251c\u2500\u2500 Drosophila_melanogaster \u251c\u2500\u2500 Gallus_gallus \u251c\u2500\u2500 hisat2 \u251c\u2500\u2500 Homo_sapiens \u251c\u2500\u2500 Macaca_mulatta \u251c\u2500\u2500 Mus_musculus \u251c\u2500\u2500 Monodelphis_domestica \u251c\u2500\u2500 PhiX \u2514\u2500\u2500 Saccharomyces_cerevisiae \u2514\u2500\u2500 tmp \u2514\u2500\u2500 hisat2 \u2514\u2500\u2500 mouse If you would like us to host a dataset or questions about what is currently available, please contact us . YCGA Data Data associated with YCGA projects and sequenceers are located on the YCGA storage system, accessible at /gpfs/ycga . For more information on accessing this data as well as sequencing data retention polices, see the YCGA Data documentation . Storage McCleary has access to a number of GPFS filesystems. /vast/palmer is McCleary's primary filesystem where Home and Scratch60 directories are located. Every group on McCleary also has access to a Project allocation on the Gibbs filesytem on /gpfs/gibbs . For more details on the different storage spaces, see our Cluster Storage documentation. You can check your current storage usage & limits by running the getquota command. Your ~/project and ~/palmer_scratch directories are shortcuts. Get a list of the absolute paths to your directories with the mydirectories command. If you want to share data in your Project or Scratch directory, see the permissions page. For information on data recovery, see the Backups and Snapshots documentation. Warning Files stored in palmer_scratch are purged if they are older than 60 days. You will receive an email alert one week before they are deleted. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage. Partition Root Directory Storage File Count Backups Snapshots home /vast/palmer/home.mccleary 125GiB/user 500,000 Yes >=2 days project /gpfs/gibbs/project 1TiB/group, increase to 4TiB on request 5,000,000 No >=2 days scratch /vast/palmer/scratch 10TiB/group 15,000,000 No No","title":"McCleary"},{"location":"clusters/mccleary/#mccleary","text":"McCleary is a shared-use resource for the Yale School of Medicine (YSM), life science researchers elsewhere on campus and projects related to the Yale Center for Genome Analysis . It consists of a variety of compute nodes networked over ethernet and mounts several shared filesystems. McCleary is named for Beatrix McCleary Hamburg , who received her medical degree in 1948 and was the first female African American graduate of Yale School of Medicine. The McCleary HPC cluster is Yale's first direct-to-chip liquid cooled cluster, moving the YCRC and the Yale research computing community into a more environmentally friendly future. Info Farnam or Ruddle user? Farnam and Ruddle were both retired in summer 2023. See our explainer for what you need to know about using McCleary and how it differs from Farnam and Ruddle.","title":"McCleary"},{"location":"clusters/mccleary/#access-the-cluster","text":"Once you have an account , the cluster can be accessed via ssh or through the Open OnDemand web portal . Warning You are limited to 4 interactive app instances (of any type) at one time. Additional instances will be rejected until you delete older open instances. For OnDemand jobs, closing the window does not terminate the interactive app job. To terminate the job, click the \"Delete\" button in your \"My Interactive Apps\" page in the web portal.","title":"Access the Cluster"},{"location":"clusters/mccleary/#system-status-and-monitoring","text":"For system status messages and the schedule for upcoming maintenance, please see the system status page . For a current node-level view of job activity, see the cluster monitor page (VPN only) .","title":"System Status and Monitoring"},{"location":"clusters/mccleary/#partitions-and-hardware","text":"McCleary is made up of several kinds of compute nodes. We group them into (sometimes overlapping) Slurm partitions meant to serve different purposes. By combining the --partition and --constraint Slurm options you can more finely control what nodes your jobs can run on. Info YCGA sequence data user? To avoid being charged for your cpu usage for YCGA-related work, make sure to submit jobs to the ycga partition with -p ycga. Job Submission Rate Limits Job submissions are limited to 200 jobs per hour . See the Rate Limits section in the Common Job Failures page for more info.","title":"Partitions and Hardware"},{"location":"clusters/mccleary/#public-partitions","text":"See each tab below for more information about the available common use partitions. day Use the day partition for most batch jobs. This is the default if you don't specify one with --partition . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the day partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per group 512 Maximum memory per group 6000G Maximum CPUs per user 256 Maximum memory per user 3000G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 26 8358 64 983 icelake, avx512, 8358, nogpu, bigtmp, common 15 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, common devel Use the devel partition to jobs with which you need ongoing interaction. For example, exploratory analyses or debugging compilation. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the devel partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 4 Maximum memory per user 32G Maximum submitted jobs per user 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 10 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, common week Use the week partition for jobs that need a longer runtime than day allows. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the week partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Maximum CPUs per group 192 Maximum memory per group 2949G Maximum CPUs per user 192 Maximum memory per user 2949G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 14 8358 64 983 icelake, avx512, 8358, nogpu, bigtmp, common 2 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, common long Use the long partition for jobs that need a longer runtime than week allows. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=7-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the long partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Maximum CPUs per group 36 Maximum CPUs per user 36 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, common transfer Use the transfer partition to stage data for your jobs to and from cluster storage . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the transfer partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 1 Maximum running jobs per user 2 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 72 8 227 milan, 72F3, nogpu, standard, common gpu Use the gpu partition for jobs that make use of GPUs. You must request GPUs explicitly with the --gpus option in order to use them. For example, --gpus=gtx1080ti:2 would request 2 GeForce GTX 1080Ti GPUs per node. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the gpu partition are subject to the following limits: Limit Value Maximum job time limit 2-00:00:00 Maximum GPUs per group 24 Maximum GPUs per user 12 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 14 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, common, a100-80g 1 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 6 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 3 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 1 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, common, bigtmp, a100 9 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, oldest, common, gtx1080ti gpu_devel Use the gpu_devel partition to debug jobs that make use of GPUs, or to develop GPU-enabled code. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the gpu_devel partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 4 Maximum memory per user 32G Maximum submitted jobs per user 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 E5-2623_v4 8 38 gtx1080ti 4 11 broadwell, E5-2623_v4, singleprecision, common, gtx1080ti bigmem Use the bigmem partition for jobs that have memory requirements other partitions can't handle. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the bigmem partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 32 Maximum memory per user 3960G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 4 6346 32 3960 icelake, avx512, 6346, nogpu, bigtmp, common 2 6234 16 1486 cascadelake, avx512, 6234, nogpu, common, bigtmp 3 6240 36 1486 cascadelake, avx512, 6240, nogpu, pi, bigtmp scavenge Use the scavenge partition to run preemptable jobs on more resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the scavenge partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 1000 Maximum memory per user 20000G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 48 8362 64 479 icelake, avx512, 8362, nogpu, standard, pi 20 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, pi, a100 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, common, a100-80g 4 6346 32 1991 icelake, avx512, 6346, nogpu, pi 4 6326 32 479 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 4 6346 32 3960 icelake, avx512, 6346, nogpu, bigtmp, common 40 8358 64 983 icelake, avx512, 8358, nogpu, bigtmp, common 4 6240 36 730 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 42 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 4 6240 36 352 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 9 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 2 6240 36 167 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 19 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, common 1 6242 32 981 rtx8000 2 48 cascadelake, avx512, 6242, doubleprecision, pi, bigtmp, rtx8000 1 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 6 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 10 8268 48 352 cascadelake, avx512, 8268, nogpu, bigtmp, pi 2 6240 36 163 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 1 6248r 48 352 cascadelake, avx512, 6248r, nogpu, pi, bigtmp 2 6234 16 1486 cascadelake, avx512, 6234, nogpu, common, bigtmp 1 6240 36 352 v100 4 16 cascadelake, avx512, 6240, pi, v100 3 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 1 6240 36 352 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 1 6420 36 730 a100 4 40 cascadelake, avx512, 6420, doubleprecision, pi, bigtmp, a100 1 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, common, bigtmp, a100 1 6240 36 163 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 6 6240 36 1486 cascadelake, avx512, 6240, nogpu, pi, bigtmp 1 6226r 32 163 rtx3090 4 24 cascadelake, avx512, 6226r, doubleprecision, pi, rtx3090 8 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, pi, bigtmp, rtx5000 2 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, a100 1 6240 36 163 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 2 6132 28 163 skylake, avx512, 6132, nogpu, standard, bigtmp, pi 1 6132 28 730 skylake, avx512, 6132, nogpu, standard, bigtmp, pi 2 5122 8 163 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 39 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi 1 E7-4820_v4 40 1486 broadwell, E7-4820_v4, nogpu, pi 9 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, oldest, common, gtx1080ti 3 E5-2660_v4 28 227 p100 2 16 broadwell, E5-2660_v4, doubleprecision, pi, p100 1 E5-2637_v4 8 101 titanv 4 12 broadwell, E5-2637_v4, doubleprecision, pi, bigtmp, titanv 11 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, gtx1080ti scavenge_gpu Use the scavenge_gpu partition to run preemptable jobs on more GPU resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the scavenge_gpu partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum GPUs per group 100 Maximum GPUs per user 64 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 20 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, pi, a100 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, common, a100-80g 4 6326 32 479 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 1 6242 32 981 rtx8000 2 48 cascadelake, avx512, 6242, doubleprecision, pi, bigtmp, rtx8000 1 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 6 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 2 6240 36 163 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 1 6240 36 352 v100 4 16 cascadelake, avx512, 6240, pi, v100 3 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 1 6240 36 352 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 1 6420 36 730 a100 4 40 cascadelake, avx512, 6420, doubleprecision, pi, bigtmp, a100 1 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, common, bigtmp, a100 1 6240 36 163 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 1 6226r 32 163 rtx3090 4 24 cascadelake, avx512, 6226r, doubleprecision, pi, rtx3090 8 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, pi, bigtmp, rtx5000 2 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, a100 1 6240 36 163 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 2 5122 8 163 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 9 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, oldest, common, gtx1080ti 3 E5-2660_v4 28 227 p100 2 16 broadwell, E5-2660_v4, doubleprecision, pi, p100 1 E5-2637_v4 8 101 titanv 4 12 broadwell, E5-2637_v4, doubleprecision, pi, bigtmp, titanv 11 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, gtx1080ti","title":"Public Partitions"},{"location":"clusters/mccleary/#private-partitions","text":"With few exceptions, jobs submitted to private partitions are not considered when calculating your group's Fairshare . Your group can purchase additional hardware for private use, which we will make available as a pi_groupname partition. These nodes are purchased by you, but supported and administered by us. After vendor support expires, we retire compute nodes. Compute nodes can range from $10K to upwards of $50K depending on your requirements. If you are interested in purchasing nodes for your group, please contact us . PI Partitions (click to expand) pi_breaker Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_breaker partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 23 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi pi_bunick Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_bunick partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, a100 pi_butterwick Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_butterwick partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, a100 pi_chenlab Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_chenlab partition are subject to the following limits: Limit Value Maximum job time limit 14-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 352 cascadelake, avx512, 8268, nogpu, bigtmp, pi pi_cryo_realtime Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_cryo_realtime partition are subject to the following limits: Limit Value Maximum job time limit 14-00:00:00 Maximum GPUs per user 12 Maximum running jobs per user 2 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, gtx1080ti pi_cryoem Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_cryoem partition are subject to the following limits: Limit Value Maximum job time limit 4-00:00:00 Maximum CPUs per user 32 Maximum GPUs per user 12 Maximum running jobs per user 2 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 6 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000 9 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, gtx1080ti pi_deng Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_deng partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 E5-2660_v4 28 227 p100 2 16 broadwell, E5-2660_v4, doubleprecision, pi, p100 pi_dewan Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_dewan partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_dijk Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_dijk partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 352 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 pi_dunn Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_dunn partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_edwards Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_edwards partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_falcone Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_falcone partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 1 6240 36 352 v100 4 16 cascadelake, avx512, 6240, pi, v100 1 6240 36 1486 cascadelake, avx512, 6240, nogpu, pi, bigtmp pi_galvani Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_galvani partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 7 8268 48 352 cascadelake, avx512, 8268, nogpu, bigtmp, pi pi_gerstein Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_gerstein partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6132 28 163 skylake, avx512, 6132, nogpu, standard, bigtmp, pi 1 6132 28 730 skylake, avx512, 6132, nogpu, standard, bigtmp, pi 11 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi 1 E7-4820_v4 40 1486 broadwell, E7-4820_v4, nogpu, pi pi_gerstein_gpu Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_gerstein_gpu partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, pi, a100 1 6240 36 163 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 1 6240 36 163 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 2 E5-2660_v4 28 227 p100 2 16 broadwell, E5-2660_v4, doubleprecision, pi, p100 1 E5-2637_v4 8 101 titanv 4 12 broadwell, E5-2637_v4, doubleprecision, pi, bigtmp, titanv pi_gruen Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_gruen partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi pi_hall Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_hall partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 40 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_hall_bigmem Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_hall_bigmem partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 1486 cascadelake, avx512, 6240, nogpu, pi, bigtmp pi_jadi Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_jadi partition are subject to the following limits: Limit Value Maximum job time limit 365-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi pi_jetz Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_jetz partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 8358 64 1991 icelake, avx512, 8358, nogpu, bigtmp, pi 4 6240 36 730 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 4 6240 36 352 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_kleinstein Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_kleinstein partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_krishnaswamy Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_krishnaswamy partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6420 36 730 a100 4 40 cascadelake, avx512, 6420, doubleprecision, pi, bigtmp, a100 pi_ma Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_ma partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 352 cascadelake, avx512, 8268, nogpu, bigtmp, pi pi_medzhitov Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_medzhitov partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 167 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_miranker Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_miranker partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6248r 48 352 cascadelake, avx512, 6248r, nogpu, pi, bigtmp pi_ohern Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_ohern partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi pi_reinisch Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_reinisch partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 5122 8 163 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 pi_sestan Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_sestan partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 8358 64 1991 icelake, avx512, 8358, nogpu, bigtmp, pi pi_sigworth Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_sigworth partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 163 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti pi_sindelar Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_sindelar partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 163 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 1 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, gtx1080ti pi_tomography Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_tomography partition are subject to the following limits: Limit Value Maximum job time limit 4-00:00:00 Maximum CPUs per user 32 Maximum GPUs per user 24 Maximum running jobs per user 2 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6242 32 981 rtx8000 2 48 cascadelake, avx512, 6242, doubleprecision, pi, bigtmp, rtx8000 8 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, pi, bigtmp, rtx5000 pi_townsend Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_townsend partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_tsang Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_tsang partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 4 8358 64 983 icelake, avx512, 8358, nogpu, bigtmp, pi pi_ya-chi_ho Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_ya-chi_ho partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 352 cascadelake, avx512, 8268, nogpu, bigtmp, pi pi_yong_xiong Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_yong_xiong partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 4 6326 32 479 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 1 6226r 32 163 rtx3090 4 24 cascadelake, avx512, 6226r, doubleprecision, pi, rtx3090 pi_zhao Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_zhao partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi","title":"Private Partitions"},{"location":"clusters/mccleary/#ycga-partitions","text":"The following partitions are intended for projects related to the Yale Center for Genome Analysis . Please do not use these partitions for other proejcts. Access is granted on a group basis. If you need access to these partitions, please contact us to get approved and added. YCGA Partitions (click to expand) ycga Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the ycga partition are subject to the following limits: Limit Value Maximum job time limit 2-00:00:00 Maximum CPUs per group 512 Maximum memory per group 3934G Maximum CPUs per user 256 Maximum memory per user 1916G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 40 8362 64 479 icelake, avx512, 8362, nogpu, standard, pi ycga_admins Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 8362 64 479 icelake, avx512, 8362, nogpu, standard, pi ycga_bigmem Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the ycga_bigmem partition are subject to the following limits: Limit Value Maximum job time limit 4-00:00:00 Maximum CPUs per user 64 Maximum memory per user 1991G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 4 6346 32 1991 icelake, avx512, 6346, nogpu, pi ycga_long Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the ycga_long partition are subject to the following limits: Limit Value Maximum job time limit 14-00:00:00 Maximum CPUs per group 64 Maximum memory per group 479G Maximum CPUs per user 32 Maximum memory per user 239G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 6 8362 64 479 icelake, avx512, 8362, nogpu, standard, pi","title":"YCGA Partitions"},{"location":"clusters/mccleary/#public-datasets","text":"We host datasets of general interest in a loosely organized directory tree in /gpfs/gibbs/data : \u251c\u2500\u2500 alphafold-2.3 \u251c\u2500\u2500 alphafold-2.2 (deprecated) \u251c\u2500\u2500 alphafold-2.0 (deprecated) \u251c\u2500\u2500 annovar \u2502 \u2514\u2500\u2500 humandb \u251c\u2500\u2500 cryoem \u251c\u2500\u2500 db \u2502 \u251c\u2500\u2500 annovar \u2502 \u251c\u2500\u2500 blast \u2502 \u251c\u2500\u2500 busco \u2502 \u2514\u2500\u2500 Pfam \u2514\u2500\u2500 genomes \u251c\u2500\u2500 1000Genomes \u251c\u2500\u2500 10xgenomics \u251c\u2500\u2500 Aedes_aegypti \u251c\u2500\u2500 Bos_taurus \u251c\u2500\u2500 Chelonoidis_nigra \u251c\u2500\u2500 Danio_rerio \u251c\u2500\u2500 Drosophila_melanogaster \u251c\u2500\u2500 Gallus_gallus \u251c\u2500\u2500 hisat2 \u251c\u2500\u2500 Homo_sapiens \u251c\u2500\u2500 Macaca_mulatta \u251c\u2500\u2500 Mus_musculus \u251c\u2500\u2500 Monodelphis_domestica \u251c\u2500\u2500 PhiX \u2514\u2500\u2500 Saccharomyces_cerevisiae \u2514\u2500\u2500 tmp \u2514\u2500\u2500 hisat2 \u2514\u2500\u2500 mouse If you would like us to host a dataset or questions about what is currently available, please contact us .","title":"Public Datasets"},{"location":"clusters/mccleary/#ycga-data","text":"Data associated with YCGA projects and sequenceers are located on the YCGA storage system, accessible at /gpfs/ycga . For more information on accessing this data as well as sequencing data retention polices, see the YCGA Data documentation .","title":"YCGA Data"},{"location":"clusters/mccleary/#storage","text":"McCleary has access to a number of GPFS filesystems. /vast/palmer is McCleary's primary filesystem where Home and Scratch60 directories are located. Every group on McCleary also has access to a Project allocation on the Gibbs filesytem on /gpfs/gibbs . For more details on the different storage spaces, see our Cluster Storage documentation. You can check your current storage usage & limits by running the getquota command. Your ~/project and ~/palmer_scratch directories are shortcuts. Get a list of the absolute paths to your directories with the mydirectories command. If you want to share data in your Project or Scratch directory, see the permissions page. For information on data recovery, see the Backups and Snapshots documentation. Warning Files stored in palmer_scratch are purged if they are older than 60 days. You will receive an email alert one week before they are deleted. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage. Partition Root Directory Storage File Count Backups Snapshots home /vast/palmer/home.mccleary 125GiB/user 500,000 Yes >=2 days project /gpfs/gibbs/project 1TiB/group, increase to 4TiB on request 5,000,000 No >=2 days scratch /vast/palmer/scratch 10TiB/group 15,000,000 No No","title":"Storage"},{"location":"clusters/milgram-workstations/","text":"Milgram Workstations Host Name Lab Location cannon1.milgram.hpc.yale.internal Cannon SSS Hall cannon2.milgram.hpc.yale.internal Cannon SSS Hall casey1.milgram.hpc.yale.internal Casey SSS Hall chang1.milgram.hpc.yale.internal Chang Dunham Lab cl1.milgram.hpc.yale.internal Chun SSS Hall cl2.milgram.hpc.yale.internal Chun SSS Hall cl3.milgram.hpc.yale.internal Chun SSS Hall crockett1.milgram.hpc.yale.internal Crockett Dunham Lab gee1.milgram.hpc.yale.internal Gee Kirtland Hall gee2.milgram.hpc.yale.internal Gee Kirtland Hall hl1.milgram.hpc.yale.internal Holmes SSS Hall hl2.milgram.hpc.yale.internal Holmes SSS Hall joormann1.milgram.hpc.yale.internal Joorman Kirtland Hall","title":"Milgram Workstations"},{"location":"clusters/milgram-workstations/#milgram-workstations","text":"Host Name Lab Location cannon1.milgram.hpc.yale.internal Cannon SSS Hall cannon2.milgram.hpc.yale.internal Cannon SSS Hall casey1.milgram.hpc.yale.internal Casey SSS Hall chang1.milgram.hpc.yale.internal Chang Dunham Lab cl1.milgram.hpc.yale.internal Chun SSS Hall cl2.milgram.hpc.yale.internal Chun SSS Hall cl3.milgram.hpc.yale.internal Chun SSS Hall crockett1.milgram.hpc.yale.internal Crockett Dunham Lab gee1.milgram.hpc.yale.internal Gee Kirtland Hall gee2.milgram.hpc.yale.internal Gee Kirtland Hall hl1.milgram.hpc.yale.internal Holmes SSS Hall hl2.milgram.hpc.yale.internal Holmes SSS Hall joormann1.milgram.hpc.yale.internal Joorman Kirtland Hall","title":"Milgram Workstations"},{"location":"clusters/milgram/","text":"Milgram Milgram is a HIPAA aligned cluster intended for use on projects that may involve sensitive data. This applies to both storage and computation. If you have any questions about this policy, please contact us . Milgram is named for Dr. Stanley Milgram, a psychologist who researched the behavioral motivations behind social awareness in individuals and obedience to authority figures. He conducted several famous experiments during his professorship at Yale University including the lost-letter experiment, the small-world experiment, and the Milgram experiment. Milgram Usage Policies Users wishing to use Milgram must agree to the following: All Milgram users must have fulfilled and be current with Yale's HIPAA training requirement. Since Milgram's resources are limited, we ask that you only use Milgram for work on and storage of sensitive data, and that you do your other high performance computing on our other clusters. Multifactor Authentication on Milgram Multifactor authentication via Duo is required for all users on Milgram. For most usage this additional step is minimally invasive and makes our clusters much more secure. However, for users who use graphical transfer tools such as Cyberduck, please see our MFA transfer documentation . Access the Cluster Once you have an account , the cluster can be accessed via ssh or through the Open OnDemand web portal . Info Connections to Milgram can only be made from the Yale VPN ( access.yale.edu )--even if you are already on campus (YaleSecure or ethernet). See our VPN page for setup instructions. System Status and Monitoring For system status messages and the schedule for upcoming maintenance, please see the system status page . For a current node-level view of job activity, see the cluster monitor page (VPN only) . Partitions and Hardware Milgram is made up of several kinds of compute nodes. We group them into (sometimes overlapping) Slurm partitions meant to serve different purposes. By combining the --partition and --constraint Slurm options you can more finely control what nodes your jobs can run on. Job Submission Rate Limits Job submissions are limited to 200 jobs per hour . See the Rate Limits section in the Common Job Failures page for more info. Public Partitions See each tab below for more information about the available common use partitions. day Use the day partition for most batch jobs. This is the default if you don't specify one with --partition . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the day partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 324 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 14 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp interactive Use the interactive partition to jobs with which you need ongoing interaction. For example, exploratory analyses or debugging compilation. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the interactive partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 4 Maximum memory per user 32G Maximum running jobs per user 1 Maximum submitted jobs per user 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp week Use the week partition for jobs that need a longer runtime than day allows. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the week partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Maximum CPUs per user 72 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 4 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp gpu Use the gpu partition for jobs that make use of GPUs. You must request GPUs explicitly with the --gpus option in order to use them. For example, --gpus=gtx1080ti:2 would request 2 GeForce GTX 1080Ti GPUs per node. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the gpu partition are subject to the following limits: Limit Value Maximum job time limit 2-00:00:00 Maximum GPUs per user 4 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 scavenge Use the scavenge partition to run preemptable jobs on more resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the scavenge partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6326 32 497 a40 4 48 icelake, a40, avx512, pi, 6326, singleprecision, bigtmp 18 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp 2 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 10 6240 36 372 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 47 E5-2660_v4 28 247 broadwell, E5-2660_v4, nogpu, standard, pi, oldest Private Partitions With few exceptions, jobs submitted to private partitions are not considered when calculating your group's Fairshare . Your group can purchase additional hardware for private use, which we will make available as a pi_groupname partition. These nodes are purchased by you, but supported and administered by us. After vendor support expires, we retire compute nodes. Compute nodes can range from $10K to upwards of $50K depending on your requirements. If you are interested in purchasing nodes for your group, please contact us . PI Partitions (click to expand) pi_shung Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6326 32 497 a40 4 48 icelake, a40, avx512, pi, 6326, singleprecision, bigtmp psych_day Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the psych_day partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per group 500 Maximum memory per group 2500G Maximum CPUs per user 350 Maximum memory per user 1750G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 43 E5-2660_v4 28 247 broadwell, E5-2660_v4, nogpu, standard, pi, oldest psych_gpu Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the psych_gpu partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Maximum GPUs per user 20 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 10 6240 36 372 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti psych_interactive Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the psych_interactive partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 4 Maximum memory per user 32G Maximum running jobs per user 1 Maximum submitted jobs per user 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 E5-2660_v4 28 247 broadwell, E5-2660_v4, nogpu, standard, pi, oldest psych_scavenge Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the psych_scavenge partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 10 6240 36 372 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 47 E5-2660_v4 28 247 broadwell, E5-2660_v4, nogpu, standard, pi, oldest psych_week Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the psych_week partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Maximum CPUs per group 500 Maximum memory per group 2500G Maximum CPUs per user 350 Maximum memory per user 1750G Maximum CPUs in use 448 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 43 E5-2660_v4 28 247 broadwell, E5-2660_v4, nogpu, standard, pi, oldest Storage /gpfs/milgram is Milgram's primary filesystem where home, project, and scratch60 directories are located. For more details on the different storage spaces, see our Cluster Storage documentation. You can check your current storage usage & limits by running the getquota command. Note that the per-user usage breakdown only update once daily. For information on data recovery, see the Backups and Snapshots documentation. Warning Files stored in scratch60 are purged if they are older than 60 days. You will receive an email alert one week before they are deleted. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage. Partition Root Directory Storage File Count Backups Snapshots home /gpfs/milgram/home 125GiB/user 500,000 Yes >=2 days project /gpfs/milgram/project 1TiB/group, increase to 4TiB on request 5,000,000 Yes >=2 days scratch60 /gpfs/milgram/scratch60 20TiB/group 15,000,000 No No","title":"Milgram"},{"location":"clusters/milgram/#milgram","text":"Milgram is a HIPAA aligned cluster intended for use on projects that may involve sensitive data. This applies to both storage and computation. If you have any questions about this policy, please contact us . Milgram is named for Dr. Stanley Milgram, a psychologist who researched the behavioral motivations behind social awareness in individuals and obedience to authority figures. He conducted several famous experiments during his professorship at Yale University including the lost-letter experiment, the small-world experiment, and the Milgram experiment.","title":"Milgram"},{"location":"clusters/milgram/#milgram-usage-policies","text":"Users wishing to use Milgram must agree to the following: All Milgram users must have fulfilled and be current with Yale's HIPAA training requirement. Since Milgram's resources are limited, we ask that you only use Milgram for work on and storage of sensitive data, and that you do your other high performance computing on our other clusters. Multifactor Authentication on Milgram Multifactor authentication via Duo is required for all users on Milgram. For most usage this additional step is minimally invasive and makes our clusters much more secure. However, for users who use graphical transfer tools such as Cyberduck, please see our MFA transfer documentation .","title":"Milgram Usage Policies"},{"location":"clusters/milgram/#access-the-cluster","text":"Once you have an account , the cluster can be accessed via ssh or through the Open OnDemand web portal . Info Connections to Milgram can only be made from the Yale VPN ( access.yale.edu )--even if you are already on campus (YaleSecure or ethernet). See our VPN page for setup instructions.","title":"Access the Cluster"},{"location":"clusters/milgram/#system-status-and-monitoring","text":"For system status messages and the schedule for upcoming maintenance, please see the system status page . For a current node-level view of job activity, see the cluster monitor page (VPN only) .","title":"System Status and Monitoring"},{"location":"clusters/milgram/#partitions-and-hardware","text":"Milgram is made up of several kinds of compute nodes. We group them into (sometimes overlapping) Slurm partitions meant to serve different purposes. By combining the --partition and --constraint Slurm options you can more finely control what nodes your jobs can run on. Job Submission Rate Limits Job submissions are limited to 200 jobs per hour . See the Rate Limits section in the Common Job Failures page for more info.","title":"Partitions and Hardware"},{"location":"clusters/milgram/#public-partitions","text":"See each tab below for more information about the available common use partitions. day Use the day partition for most batch jobs. This is the default if you don't specify one with --partition . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the day partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 324 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 14 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp interactive Use the interactive partition to jobs with which you need ongoing interaction. For example, exploratory analyses or debugging compilation. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the interactive partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 4 Maximum memory per user 32G Maximum running jobs per user 1 Maximum submitted jobs per user 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp week Use the week partition for jobs that need a longer runtime than day allows. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the week partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Maximum CPUs per user 72 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 4 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp gpu Use the gpu partition for jobs that make use of GPUs. You must request GPUs explicitly with the --gpus option in order to use them. For example, --gpus=gtx1080ti:2 would request 2 GeForce GTX 1080Ti GPUs per node. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the gpu partition are subject to the following limits: Limit Value Maximum job time limit 2-00:00:00 Maximum GPUs per user 4 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 scavenge Use the scavenge partition to run preemptable jobs on more resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the scavenge partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6326 32 497 a40 4 48 icelake, a40, avx512, pi, 6326, singleprecision, bigtmp 18 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp 2 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 10 6240 36 372 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 47 E5-2660_v4 28 247 broadwell, E5-2660_v4, nogpu, standard, pi, oldest","title":"Public Partitions"},{"location":"clusters/milgram/#private-partitions","text":"With few exceptions, jobs submitted to private partitions are not considered when calculating your group's Fairshare . Your group can purchase additional hardware for private use, which we will make available as a pi_groupname partition. These nodes are purchased by you, but supported and administered by us. After vendor support expires, we retire compute nodes. Compute nodes can range from $10K to upwards of $50K depending on your requirements. If you are interested in purchasing nodes for your group, please contact us . PI Partitions (click to expand) pi_shung Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6326 32 497 a40 4 48 icelake, a40, avx512, pi, 6326, singleprecision, bigtmp psych_day Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the psych_day partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per group 500 Maximum memory per group 2500G Maximum CPUs per user 350 Maximum memory per user 1750G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 43 E5-2660_v4 28 247 broadwell, E5-2660_v4, nogpu, standard, pi, oldest psych_gpu Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the psych_gpu partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Maximum GPUs per user 20 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 10 6240 36 372 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti psych_interactive Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the psych_interactive partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 4 Maximum memory per user 32G Maximum running jobs per user 1 Maximum submitted jobs per user 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 E5-2660_v4 28 247 broadwell, E5-2660_v4, nogpu, standard, pi, oldest psych_scavenge Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the psych_scavenge partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 10 6240 36 372 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 47 E5-2660_v4 28 247 broadwell, E5-2660_v4, nogpu, standard, pi, oldest psych_week Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the psych_week partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Maximum CPUs per group 500 Maximum memory per group 2500G Maximum CPUs per user 350 Maximum memory per user 1750G Maximum CPUs in use 448 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 43 E5-2660_v4 28 247 broadwell, E5-2660_v4, nogpu, standard, pi, oldest","title":"Private Partitions"},{"location":"clusters/milgram/#storage","text":"/gpfs/milgram is Milgram's primary filesystem where home, project, and scratch60 directories are located. For more details on the different storage spaces, see our Cluster Storage documentation. You can check your current storage usage & limits by running the getquota command. Note that the per-user usage breakdown only update once daily. For information on data recovery, see the Backups and Snapshots documentation. Warning Files stored in scratch60 are purged if they are older than 60 days. You will receive an email alert one week before they are deleted. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage. Partition Root Directory Storage File Count Backups Snapshots home /gpfs/milgram/home 125GiB/user 500,000 Yes >=2 days project /gpfs/milgram/project 1TiB/group, increase to 4TiB on request 5,000,000 Yes >=2 days scratch60 /gpfs/milgram/scratch60 20TiB/group 15,000,000 No No","title":"Storage"},{"location":"clusters/ruddle/","text":"Ruddle Ruddle was intended for use only on projects related to the Yale Center for Genome Analysis ; Please do not use this cluster for other projects. If you have any questions about this policy, please contact us . Ruddle was named for Frank Ruddle , a Yale geneticist who was a pioneer in genetic engineering and the study of developmental genetics. Ruddle Retirement After more than seven years in service, the Ruddle HPC cluster was retired on July 24th. Ruddle was replaced with the new HPC cluster, McCleary . For more information and updates see the McCleary announcement page .","title":"Ruddle"},{"location":"clusters/ruddle/#ruddle","text":"Ruddle was intended for use only on projects related to the Yale Center for Genome Analysis ; Please do not use this cluster for other projects. If you have any questions about this policy, please contact us . Ruddle was named for Frank Ruddle , a Yale geneticist who was a pioneer in genetic engineering and the study of developmental genetics. Ruddle Retirement After more than seven years in service, the Ruddle HPC cluster was retired on July 24th. Ruddle was replaced with the new HPC cluster, McCleary . For more information and updates see the McCleary announcement page .","title":"Ruddle"},{"location":"clusters-at-yale/","text":"Getting Started HPC Clusters Broadly speaking, a high performance computing (HPC) cluster is a collection of networked computers and data storage. We refer to individual servers in this network as nodes. Our clusters are only accessible to researchers remotely; your gateways to the cluster are the login nodes . From these nodes, you view files and dispatch jobs to other nodes across the cluster configured for computation, called compute nodes . The tool we use to manage these jobs is called a job scheduler . All compute nodes on a cluster mount a shared filesystem ; a file server or set of servers store files on a large array of disks. This allows your jobs to access and edit your data from any compute node. See our summary of the compute and storage hardware we maintain, from which you can navigate to a detailed description of each cluster. Request an Account The first step in gaining access to one of our clusters is to request an account. All users must adhere to the YCRC HPC Policies . To understand which cluster is appropriate for you and to request an account, visit the account request page . Be a Good Cluster Citizen While using HPC resources, here are some important things to remember: Do not run jobs, transfers or computation on a login node, instead submit jobs . Similarly, transfer nodes are only for data transfers. Do not run jobs or computation on the transfer nodes. Never give your password or ssh key to anyone else. Do not store any high risk data on the clusters, except Milgram . Do not run larger numbers of very short (less than a minute) jobs Use of the clusters is also governed by our official guidelines . Log in Once you have an account, go to our Log on to the Clusters page login information and configuration. If you want to access the clusters from outside Yale's network, you must use the Yale VPN. Schedule a Job On our clusters, you control your jobs using a job scheduling system called Slurm that allocates and manages compute resources for you. You can submit your jobs in one of two ways. For testing and small jobs you may want to run a job interactively . This way you can directly interact with the compute node(s) in real time. The other way, which is the preferred way for multiple jobs or long-running jobs, involves writing your job commands in a script and submitting that to the job scheduler. Please see our Slurm documentation or attend the Introduction to HPC workshop for more details. Use Software To best serve the diverse needs of all our researchers, we use software modules to make multiple versions of popular software available. Modules allow you to swap between different applications and versions of those applications with relative ease. We also provide assistance for installing less commonly used packages. See our Applications & Software documentation for more details. Transfer Your Files You will likely want to copy files between your computer and the clusters. There are a couple methods available to you, and the best for each situation usually depends on the size and number of files you would like to transfer. For most situations, uploading files through Open OnDemand's upload interface is the best option. This can be done directly through the file viewer interface by clicking the Upload button and dragging and dropping your files into the upload window. For more information on this as well as other upload methods, see our transferring data page. Introduction to HPC Tutorial To help new cluster users navigate their first interactive and batch jobs, we have an Introduction to HPC tutorial to correspond with the topics discussed in our Introduction to HPC YouTube video . Linux Our clusters run the Linux operating system, where we support the use of the Bash shell. A basically familiarity with Linux commands is required for interacting with the clusters. We periodically run an Intro to Linux Bootcamp to get you started. There are also many excellent beginner tutorials available for free online, including the following: Unix Tutorial for Beginners Interactive Command Line Bootcamp Hands on Training We offer several courses that will assist you with your work on our clusters. They range from orientation for absolute beginners to advanced topics on application-specific optimization. Please peruse our catalog of training to see what is available. Get Help If you have additional questions/comments, please contact us . Where applicable, please include the following information: Your NetID Cluster name Partition name Job ID(s) Error messages Command used to submit the job(s) Path(s) to scripts called by the submission command Path(s) to output files from your jobs","title":"Getting Started"},{"location":"clusters-at-yale/#getting-started","text":"","title":"Getting Started"},{"location":"clusters-at-yale/#hpc-clusters","text":"Broadly speaking, a high performance computing (HPC) cluster is a collection of networked computers and data storage. We refer to individual servers in this network as nodes. Our clusters are only accessible to researchers remotely; your gateways to the cluster are the login nodes . From these nodes, you view files and dispatch jobs to other nodes across the cluster configured for computation, called compute nodes . The tool we use to manage these jobs is called a job scheduler . All compute nodes on a cluster mount a shared filesystem ; a file server or set of servers store files on a large array of disks. This allows your jobs to access and edit your data from any compute node. See our summary of the compute and storage hardware we maintain, from which you can navigate to a detailed description of each cluster.","title":"HPC Clusters"},{"location":"clusters-at-yale/#request-an-account","text":"The first step in gaining access to one of our clusters is to request an account. All users must adhere to the YCRC HPC Policies . To understand which cluster is appropriate for you and to request an account, visit the account request page .","title":"Request an Account"},{"location":"clusters-at-yale/#be-a-good-cluster-citizen","text":"While using HPC resources, here are some important things to remember: Do not run jobs, transfers or computation on a login node, instead submit jobs . Similarly, transfer nodes are only for data transfers. Do not run jobs or computation on the transfer nodes. Never give your password or ssh key to anyone else. Do not store any high risk data on the clusters, except Milgram . Do not run larger numbers of very short (less than a minute) jobs Use of the clusters is also governed by our official guidelines .","title":"Be a Good Cluster Citizen"},{"location":"clusters-at-yale/#log-in","text":"Once you have an account, go to our Log on to the Clusters page login information and configuration. If you want to access the clusters from outside Yale's network, you must use the Yale VPN.","title":"Log in"},{"location":"clusters-at-yale/#schedule-a-job","text":"On our clusters, you control your jobs using a job scheduling system called Slurm that allocates and manages compute resources for you. You can submit your jobs in one of two ways. For testing and small jobs you may want to run a job interactively . This way you can directly interact with the compute node(s) in real time. The other way, which is the preferred way for multiple jobs or long-running jobs, involves writing your job commands in a script and submitting that to the job scheduler. Please see our Slurm documentation or attend the Introduction to HPC workshop for more details.","title":"Schedule a Job"},{"location":"clusters-at-yale/#use-software","text":"To best serve the diverse needs of all our researchers, we use software modules to make multiple versions of popular software available. Modules allow you to swap between different applications and versions of those applications with relative ease. We also provide assistance for installing less commonly used packages. See our Applications & Software documentation for more details.","title":"Use Software"},{"location":"clusters-at-yale/#transfer-your-files","text":"You will likely want to copy files between your computer and the clusters. There are a couple methods available to you, and the best for each situation usually depends on the size and number of files you would like to transfer. For most situations, uploading files through Open OnDemand's upload interface is the best option. This can be done directly through the file viewer interface by clicking the Upload button and dragging and dropping your files into the upload window. For more information on this as well as other upload methods, see our transferring data page.","title":"Transfer Your Files"},{"location":"clusters-at-yale/#introduction-to-hpc-tutorial","text":"To help new cluster users navigate their first interactive and batch jobs, we have an Introduction to HPC tutorial to correspond with the topics discussed in our Introduction to HPC YouTube video .","title":"Introduction to HPC Tutorial"},{"location":"clusters-at-yale/#linux","text":"Our clusters run the Linux operating system, where we support the use of the Bash shell. A basically familiarity with Linux commands is required for interacting with the clusters. We periodically run an Intro to Linux Bootcamp to get you started. There are also many excellent beginner tutorials available for free online, including the following: Unix Tutorial for Beginners Interactive Command Line Bootcamp","title":"Linux"},{"location":"clusters-at-yale/#hands-on-training","text":"We offer several courses that will assist you with your work on our clusters. They range from orientation for absolute beginners to advanced topics on application-specific optimization. Please peruse our catalog of training to see what is available.","title":"Hands on Training"},{"location":"clusters-at-yale/#get-help","text":"If you have additional questions/comments, please contact us . Where applicable, please include the following information: Your NetID Cluster name Partition name Job ID(s) Error messages Command used to submit the job(s) Path(s) to scripts called by the submission command Path(s) to output files from your jobs","title":"Get Help"},{"location":"clusters-at-yale/glossary/","text":"Glossary To help clarify the way we refer to certain terms in our user documentation, here is a brief list of some of the words that regularly come up in our documents. Please reach out to us at hpc@yale.edu if there are any other words that need to be added. Account - used to authenticate and grant permission to access resources Account (Slurm) - an accounting mechanism to keep track of a group's computing usage Activate - making something operational Array - a data structure across a series of memory locations consisting of elements organized in an index Array (job) - a series of jobs that all request the same resources and run the same batch script Array Task ID - a unique sequential number with an appended number that refers to an individual task within the set of submitted jobs Channel - Community-led collections of packages created by a group or lab installed with conda to allow for a homogenous environment across systems CLI - Command Line Interface processes commands to a computer program in the form of lines of text in a window Cluster - a set of computers, called nodes) networked together so nodes can perform the tasks facilitated by a scheduling software Command - a specific order from a computer to execute a service with either an application or the operating system Compute Node - the nodes that work runs on to perform computational work Container - A stack of software, libraries and operating system that is independent of the host computer and can be accessed on other computers Container Image - Self-contained read-only files used to run applications CPU - Central Processing Units are the components of a system that perform basic operations and exchange data with the system\u2019s memory (also known as a processor) Data - items of information collected together for reference or analysis Database - a collection of structured data held within a computer Deactivate - making something de-operational Environment - a collection of hardware, software, data storage and networks that work together in facilitating the processing and exchange of information Extension - Suffix at the end of a filename to indicate the file type Fileset - a section of a storage device that is given a designated purpose Filesystem - a process that manages how and where data is stored Flag - (see Options) GPU - Graphics Processing Units are specialized circuits designed to rapidly manipulate memory and create images in a frame buffer for a displayed output GridFTP - an extension of the Fire Transfer Protocol for grid computing that allows users to transfer and save data on a different account such as Google Drive or other off network memory Group - a collection of users who can all be given the same permissions on a system GUI - Graphical User Interface allows users to interact with devices and applications through a visual window that can commonly display icons and predetermined fields Hardware - the physical parts of a computer Host - (ie. Host Computer) A device connected to a computer network that offers resources, services and applications to users on the network Image - (See Container Image) Index - a method of sorting data by creating keywords or a listing of data Interface - a boundary across which two or more computer system components can exchange information Job - a unit of work given to an operating system by a scheduler Job Array - a way to submit multiple similar jobs by associating each subjob with an index value based on an array task id Key - a variable value applied using an algorithm to a block of unencrypted text to produce encrypted text Load - transfer a program or data into memory or into the CPU Login Node - a node that users log in on to access the cluster Memory - (see RAM) Metadata - A set of data that describes and gives basic information about other data Module - a number of distinct but interrelated units that build up or into a program MPI - Message Passing Interface is a standardized and portable message-passing standard designed to function on parallel computing architectures Multiprocessing - the ability to operate more than one task simultaneously on the same program across two or more processors in a computer Node - a server in the cluster Option - a single letter or full word that modifies the behavior of a command in a predetermined way (also known as a flag or switch) Package - a collection of hardware and software needed to create a working system Parallel - (ex. Computing/Programming) Architecture in which several processes are carried out simultaneously across smaller, independent parts Partition - a section of a storage device that is given a designated purpose Partition (Slurm) - a collection of compute nodes available via the scheduler Path - A string of characters used to identify locations throughout a directory structure Pane - (Associate with window) A subdivision within a window where an independent terminal can run simultaneously alongside another terminal Processor - (see CPU) Queue - a sequence of objects arranged according to priority waiting to be processed RAM - Random Access Memory, also known as \"Memory\" can be read and changed in any order and is typically used to to store working data Reproducibility - the ability to execute the same results across multiple systems by different individuals using the same data Scheduler - the software used to assign resources to a job for tasks Scheduling - the act of assigning resources to a task through a software product Session - a temporary information exchange between two or more devices SSH - secure shell is a cryptographic network protocol for operating network services securely over an unsecured network Software - a collection of data and instructions that tell a computer how to operate Switch - (see Options) System - a set of integrated hardware and software that input, output, process, and store data and information Task ID - a unique sequential number used to refer to a task Terminal - Referring to a terminal program, a text-based interface for typing commands Toolchain - a set of tools performing individual actions used in delivering an operation Unload - remove a program or data from memory or out of the CPU User - a person interacting and utilizing a computing service Variable - assigned and referenced data values that can be called within a program and changed depending on how the program runs Window - (Associate with pane) the whole screen being displayed, containing subdivisions, or panes, that can run independent terminals alongside each other","title":"Glossary"},{"location":"clusters-at-yale/glossary/#glossary","text":"To help clarify the way we refer to certain terms in our user documentation, here is a brief list of some of the words that regularly come up in our documents. Please reach out to us at hpc@yale.edu if there are any other words that need to be added. Account - used to authenticate and grant permission to access resources Account (Slurm) - an accounting mechanism to keep track of a group's computing usage Activate - making something operational Array - a data structure across a series of memory locations consisting of elements organized in an index Array (job) - a series of jobs that all request the same resources and run the same batch script Array Task ID - a unique sequential number with an appended number that refers to an individual task within the set of submitted jobs Channel - Community-led collections of packages created by a group or lab installed with conda to allow for a homogenous environment across systems CLI - Command Line Interface processes commands to a computer program in the form of lines of text in a window Cluster - a set of computers, called nodes) networked together so nodes can perform the tasks facilitated by a scheduling software Command - a specific order from a computer to execute a service with either an application or the operating system Compute Node - the nodes that work runs on to perform computational work Container - A stack of software, libraries and operating system that is independent of the host computer and can be accessed on other computers Container Image - Self-contained read-only files used to run applications CPU - Central Processing Units are the components of a system that perform basic operations and exchange data with the system\u2019s memory (also known as a processor) Data - items of information collected together for reference or analysis Database - a collection of structured data held within a computer Deactivate - making something de-operational Environment - a collection of hardware, software, data storage and networks that work together in facilitating the processing and exchange of information Extension - Suffix at the end of a filename to indicate the file type Fileset - a section of a storage device that is given a designated purpose Filesystem - a process that manages how and where data is stored Flag - (see Options) GPU - Graphics Processing Units are specialized circuits designed to rapidly manipulate memory and create images in a frame buffer for a displayed output GridFTP - an extension of the Fire Transfer Protocol for grid computing that allows users to transfer and save data on a different account such as Google Drive or other off network memory Group - a collection of users who can all be given the same permissions on a system GUI - Graphical User Interface allows users to interact with devices and applications through a visual window that can commonly display icons and predetermined fields Hardware - the physical parts of a computer Host - (ie. Host Computer) A device connected to a computer network that offers resources, services and applications to users on the network Image - (See Container Image) Index - a method of sorting data by creating keywords or a listing of data Interface - a boundary across which two or more computer system components can exchange information Job - a unit of work given to an operating system by a scheduler Job Array - a way to submit multiple similar jobs by associating each subjob with an index value based on an array task id Key - a variable value applied using an algorithm to a block of unencrypted text to produce encrypted text Load - transfer a program or data into memory or into the CPU Login Node - a node that users log in on to access the cluster Memory - (see RAM) Metadata - A set of data that describes and gives basic information about other data Module - a number of distinct but interrelated units that build up or into a program MPI - Message Passing Interface is a standardized and portable message-passing standard designed to function on parallel computing architectures Multiprocessing - the ability to operate more than one task simultaneously on the same program across two or more processors in a computer Node - a server in the cluster Option - a single letter or full word that modifies the behavior of a command in a predetermined way (also known as a flag or switch) Package - a collection of hardware and software needed to create a working system Parallel - (ex. Computing/Programming) Architecture in which several processes are carried out simultaneously across smaller, independent parts Partition - a section of a storage device that is given a designated purpose Partition (Slurm) - a collection of compute nodes available via the scheduler Path - A string of characters used to identify locations throughout a directory structure Pane - (Associate with window) A subdivision within a window where an independent terminal can run simultaneously alongside another terminal Processor - (see CPU) Queue - a sequence of objects arranged according to priority waiting to be processed RAM - Random Access Memory, also known as \"Memory\" can be read and changed in any order and is typically used to to store working data Reproducibility - the ability to execute the same results across multiple systems by different individuals using the same data Scheduler - the software used to assign resources to a job for tasks Scheduling - the act of assigning resources to a task through a software product Session - a temporary information exchange between two or more devices SSH - secure shell is a cryptographic network protocol for operating network services securely over an unsecured network Software - a collection of data and instructions that tell a computer how to operate Switch - (see Options) System - a set of integrated hardware and software that input, output, process, and store data and information Task ID - a unique sequential number used to refer to a task Terminal - Referring to a terminal program, a text-based interface for typing commands Toolchain - a set of tools performing individual actions used in delivering an operation Unload - remove a program or data from memory or out of the CPU User - a person interacting and utilizing a computing service Variable - assigned and referenced data values that can be called within a program and changed depending on how the program runs Window - (Associate with pane) the whole screen being displayed, containing subdivisions, or panes, that can run independent terminals alongside each other","title":"Glossary"},{"location":"clusters-at-yale/help-requests/","text":"Help Requests See our Get Help section for ways to get assistance, from email support to setting up 1-on-1 appointments with our staff. When requesting assistance provide the information described below (where applicable), so we can most effectively assist you. Before requesting assistance, we encourage you to take a look at the relevant documentation on this site. If you are new to the cluster, please watch our Intro to HPC tutorial available on the YCRC YouTube Channel as it covers many common usages of the systems. Troubleshoot Login If you are having trouble logging in to the cluster, please see our Troubleshoot Login guide. Information to Provide with Help Requests Whenever requesting assistance with HPC related issues, please provide the YCRC staff with the following information (where applicable) so we can investigate the problem you are encountering. To assist with providing this information, we have included instructions below on retreiving the information if you are working in the command line interface. Your NetID name of the cluster you are working on (e.g. Grace, Milgram, Ruddle or McCleary) instructions on how to repeat your issue. Please include the following: which directory are you working in or where you submitted your job Run the command pwd when you are in the directory where you encountered the issue the software modules you have loaded Run module list when you encounter the issue the commands you ran that resulted in the error or issue the name of the submission script your submitted to the scheduler with sbatch (if reporting an issue with a batch job) the error message you received, and, if applicable, the path to the output file containing the error message if you are using the default Slurm output options, this will look slurm-.out certain software may output additional information to other log files and, if applicable, include the paths to those files as well job ids for your Slurm jobs you can get the job ids for recently run jobs by running the command sacct identify the job(s) that contained the error and provide the job id(s) If possible, please paste the output into the email or include in a text file as an attachment. Screenshots or pictures are very hard for us to work with. We look forwarding to assisting you!","title":"Help Requests"},{"location":"clusters-at-yale/help-requests/#help-requests","text":"See our Get Help section for ways to get assistance, from email support to setting up 1-on-1 appointments with our staff. When requesting assistance provide the information described below (where applicable), so we can most effectively assist you. Before requesting assistance, we encourage you to take a look at the relevant documentation on this site. If you are new to the cluster, please watch our Intro to HPC tutorial available on the YCRC YouTube Channel as it covers many common usages of the systems.","title":"Help Requests"},{"location":"clusters-at-yale/help-requests/#troubleshoot-login","text":"If you are having trouble logging in to the cluster, please see our Troubleshoot Login guide.","title":"Troubleshoot Login"},{"location":"clusters-at-yale/help-requests/#information-to-provide-with-help-requests","text":"Whenever requesting assistance with HPC related issues, please provide the YCRC staff with the following information (where applicable) so we can investigate the problem you are encountering. To assist with providing this information, we have included instructions below on retreiving the information if you are working in the command line interface. Your NetID name of the cluster you are working on (e.g. Grace, Milgram, Ruddle or McCleary) instructions on how to repeat your issue. Please include the following: which directory are you working in or where you submitted your job Run the command pwd when you are in the directory where you encountered the issue the software modules you have loaded Run module list when you encounter the issue the commands you ran that resulted in the error or issue the name of the submission script your submitted to the scheduler with sbatch (if reporting an issue with a batch job) the error message you received, and, if applicable, the path to the output file containing the error message if you are using the default Slurm output options, this will look slurm-.out certain software may output additional information to other log files and, if applicable, include the paths to those files as well job ids for your Slurm jobs you can get the job ids for recently run jobs by running the command sacct identify the job(s) that contained the error and provide the job id(s) If possible, please paste the output into the email or include in a text file as an attachment. Screenshots or pictures are very hard for us to work with. We look forwarding to assisting you!","title":"Information to Provide with Help Requests"},{"location":"clusters-at-yale/troubleshoot/","text":"Troubleshoot Login Checklist If you are having trouble logging into a cluster, please use the checklist below to check for common issues: Make sure you have submitted an account request and have gotten word that we created your account for the cluster. Make sure that the cluster is online in the System Status page. Check the hostname for the cluster. See the clusters page for a list. Verify that your ssh keys are setup correctly Check for your public key in the ssh key uploader . If you recently uploaded one, it will take a few minutes appear on the cluster. If you are using macOS or Linux , make sure your private key is in ~/.ssh . If you are using Windows , make sure you have pointed MobaXterm to your private ssh key (ends in .pem) If you are asked for a passphrase when logging in, this is the ssh key passphrase you set when first creating your key pair. If you have forgotten this passphrase, you need to create a new key pair and upload a new public key. Make sure your computer is either on Yale's campus network (ethernet or YaleSecure wireless) or Yale's VPN . If you get an error like could not resolve hostname you may have lost connection to the Yale network. If you are sure you have not, make sure that you are also using the Yale DNS servers (130.132.1.9,10,11). Your home directory should only be writable by you. If you recently modified the permissions to your home directory and can't log in, contact us and we can fix the permissions for you. If you are using McCleary or Milgram , we require Duo MFA for every login. If following our MFA Troubleshooting steps doesn't work, contact the ITS Help Desk . If none of the above solve your issue, please contact us with your netid and the cluster you are attempting to connect to. Common SSH Errors Permission denied (publickey) This message means that the clusters don't (yet) have they key you are using to authenticate. Make sure you have an account on the cluster you're connecting, that you have created an ssh key pair , and uploaded the public key . If you recently uploaded one, it will take a few minutes appear on the cluster. REMOTE HOST IDENTIFICATION HAS CHANGED! If you are seeing the following error: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! .... Offending key in /home/user/.ssh/known_hosts:34 ... This usually means that the keys that identify the cluster login nodes have changed. This can be the result of system upgrades on the cluster (see Grace August 2023 Maintenance ). It could also mean someone is trying to intercept your ssh session. Please contact us if you receive this error outside of known system upgrades. If the host keys have indeed changed on the server you are connecting to, you can edit ~/.ssh/known_hosts and remove the offending line. In the example above, you would need to delete line 34 in ~/.ssh/known_hosts before you re-connect.","title":"Troubleshoot Login"},{"location":"clusters-at-yale/troubleshoot/#troubleshoot-login","text":"","title":"Troubleshoot Login"},{"location":"clusters-at-yale/troubleshoot/#checklist","text":"If you are having trouble logging into a cluster, please use the checklist below to check for common issues: Make sure you have submitted an account request and have gotten word that we created your account for the cluster. Make sure that the cluster is online in the System Status page. Check the hostname for the cluster. See the clusters page for a list. Verify that your ssh keys are setup correctly Check for your public key in the ssh key uploader . If you recently uploaded one, it will take a few minutes appear on the cluster. If you are using macOS or Linux , make sure your private key is in ~/.ssh . If you are using Windows , make sure you have pointed MobaXterm to your private ssh key (ends in .pem) If you are asked for a passphrase when logging in, this is the ssh key passphrase you set when first creating your key pair. If you have forgotten this passphrase, you need to create a new key pair and upload a new public key. Make sure your computer is either on Yale's campus network (ethernet or YaleSecure wireless) or Yale's VPN . If you get an error like could not resolve hostname you may have lost connection to the Yale network. If you are sure you have not, make sure that you are also using the Yale DNS servers (130.132.1.9,10,11). Your home directory should only be writable by you. If you recently modified the permissions to your home directory and can't log in, contact us and we can fix the permissions for you. If you are using McCleary or Milgram , we require Duo MFA for every login. If following our MFA Troubleshooting steps doesn't work, contact the ITS Help Desk . If none of the above solve your issue, please contact us with your netid and the cluster you are attempting to connect to.","title":"Checklist"},{"location":"clusters-at-yale/troubleshoot/#common-ssh-errors","text":"","title":"Common SSH Errors"},{"location":"clusters-at-yale/troubleshoot/#permission-denied-publickey","text":"This message means that the clusters don't (yet) have they key you are using to authenticate. Make sure you have an account on the cluster you're connecting, that you have created an ssh key pair , and uploaded the public key . If you recently uploaded one, it will take a few minutes appear on the cluster.","title":"Permission denied (publickey)"},{"location":"clusters-at-yale/troubleshoot/#remote-host-identification-has-changed","text":"If you are seeing the following error: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! .... Offending key in /home/user/.ssh/known_hosts:34 ... This usually means that the keys that identify the cluster login nodes have changed. This can be the result of system upgrades on the cluster (see Grace August 2023 Maintenance ). It could also mean someone is trying to intercept your ssh session. Please contact us if you receive this error outside of known system upgrades. If the host keys have indeed changed on the server you are connecting to, you can edit ~/.ssh/known_hosts and remove the offending line. In the example above, you would need to delete line 34 in ~/.ssh/known_hosts before you re-connect.","title":"REMOTE HOST IDENTIFICATION HAS CHANGED!"},{"location":"clusters-at-yale/access/","text":"Log on to the Clusters To log on the cluster, you must first request an account (if you do not already have one). When using the clusters, please review and abide by our HPC usage policies and best practices . Off Campus Access You must be on the campus network to access the clusters. For off-campus access you need to use the Yale VPN . Web Portal - Open OnDemand For most users, we recommend using the web portal, Open OnDemand, to access the clusters. For hostnames and more instructions see our Open OnDemand documentation. SSH Connection For more advanced use cases that are not well supported by the Web Portal (Open OnDemand), you can connect to the clusters over the more traditional SSH connection .","title":"Log on to the Clusters"},{"location":"clusters-at-yale/access/#log-on-to-the-clusters","text":"To log on the cluster, you must first request an account (if you do not already have one). When using the clusters, please review and abide by our HPC usage policies and best practices . Off Campus Access You must be on the campus network to access the clusters. For off-campus access you need to use the Yale VPN .","title":"Log on to the Clusters"},{"location":"clusters-at-yale/access/#web-portal-open-ondemand","text":"For most users, we recommend using the web portal, Open OnDemand, to access the clusters. For hostnames and more instructions see our Open OnDemand documentation.","title":"Web Portal - Open OnDemand"},{"location":"clusters-at-yale/access/#ssh-connection","text":"For more advanced use cases that are not well supported by the Web Portal (Open OnDemand), you can connect to the clusters over the more traditional SSH connection .","title":"SSH Connection"},{"location":"clusters-at-yale/access/accounts/","text":"Accounts & Best Practices The YCRC HPC Policies can found here . All users are required to abide by the described policies. HPC Policies Do not run jobs, transfers or computation on a login node, instead submit jobs . Similarly, transfer nodes are only for data transfers. Do not run jobs or computation on the transfer nodes. Never give your password or ssh key to anyone else. Do not store any high risk data on the clusters, except Milgram . Do not run large numbers of very short (less than a minute) jobs. Terminate interactive or Open OnDemand session when no longer in use. Idle sessions may be canceled without warning. Avoid workflows that generate numerous (thousands) of files as these put great stress on the shared filesystem. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage. Each YCRC cluster undergoes regular scheduled maintenance twice a year, see our maintenance schedule for more details. Group Allocations A research group may request an allocation on one of Yale's HPC clusters . Each group is granted access to the common compute resources and a limited cluster storage allocation . Request an Account You may request an account on a cluster using the account request form . User accounts are personal to individual users and may not be shared. Under no circumstances may any user make use of another user\u2019s account. Inactive Accounts and Account Deletion For security and communication purposes, you must have a valid email address associated with your account. Login privileges will be disable on a regular basis for any accounts without a valid email address. Therefore, if you are leaving Yale, but will continue to use the cluster on a \"Sponsored netid\" , please contact us to update the email address associated with your account as soon as possible. If you find your login has been disabled, please contact us to provide a valid email address to have your login reinstated. Additionally, an annual account audit is performed on November 1st and any accounts associated with an inactive netids (regular and Sponsored netids) will be deactivated at that time. Note that Sponsored netids need to be renewed annually through the appropriate channels. When an account is deactivated, logins and scheduler access are disabled, the home directory is archived for 5 years and all project data owned by the account is reassigned to the group's PI. The group's PI will receive a report once a year in November with a list of deactivated group members. Every group must have a PI with a valid affiliation with Yale. If your PI has left Yale, you may be asked to identify a new faculty sponsor for your account in order to continue accessing the cluster.","title":"Accounts & Best Practices"},{"location":"clusters-at-yale/access/accounts/#accounts-best-practices","text":"The YCRC HPC Policies can found here . All users are required to abide by the described policies.","title":"Accounts & Best Practices"},{"location":"clusters-at-yale/access/accounts/#hpc-policies","text":"Do not run jobs, transfers or computation on a login node, instead submit jobs . Similarly, transfer nodes are only for data transfers. Do not run jobs or computation on the transfer nodes. Never give your password or ssh key to anyone else. Do not store any high risk data on the clusters, except Milgram . Do not run large numbers of very short (less than a minute) jobs. Terminate interactive or Open OnDemand session when no longer in use. Idle sessions may be canceled without warning. Avoid workflows that generate numerous (thousands) of files as these put great stress on the shared filesystem. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage. Each YCRC cluster undergoes regular scheduled maintenance twice a year, see our maintenance schedule for more details.","title":"HPC Policies"},{"location":"clusters-at-yale/access/accounts/#group-allocations","text":"A research group may request an allocation on one of Yale's HPC clusters . Each group is granted access to the common compute resources and a limited cluster storage allocation .","title":"Group Allocations"},{"location":"clusters-at-yale/access/accounts/#request-an-account","text":"You may request an account on a cluster using the account request form . User accounts are personal to individual users and may not be shared. Under no circumstances may any user make use of another user\u2019s account.","title":"Request an Account"},{"location":"clusters-at-yale/access/accounts/#inactive-accounts-and-account-deletion","text":"For security and communication purposes, you must have a valid email address associated with your account. Login privileges will be disable on a regular basis for any accounts without a valid email address. Therefore, if you are leaving Yale, but will continue to use the cluster on a \"Sponsored netid\" , please contact us to update the email address associated with your account as soon as possible. If you find your login has been disabled, please contact us to provide a valid email address to have your login reinstated. Additionally, an annual account audit is performed on November 1st and any accounts associated with an inactive netids (regular and Sponsored netids) will be deactivated at that time. Note that Sponsored netids need to be renewed annually through the appropriate channels. When an account is deactivated, logins and scheduler access are disabled, the home directory is archived for 5 years and all project data owned by the account is reassigned to the group's PI. The group's PI will receive a report once a year in November with a list of deactivated group members. Every group must have a PI with a valid affiliation with Yale. If your PI has left Yale, you may be asked to identify a new faculty sponsor for your account in order to continue accessing the cluster.","title":"Inactive Accounts and Account Deletion"},{"location":"clusters-at-yale/access/advanced-config/","text":"Advanced SSH Configuration Example SSH config The following configuration is an example ssh client configuration file specific to our clusters. You can use it on Linux, Windows Subsystem for Linux (WSL) , and macOS. It allows you to use tab completion of the clusters, without the .ycrc.yale.edu suffixes (i.e. ssh grace or scp ~/my_file grace:my_file should work). It will also allow you to re-use and multiplex authenticated sessions. This means clusters that require Duo MFA will not force you to re-authenticate, as you use the same ssh connection to host multiple sessions. If you attempt to close your first connection with others running, it will wait until all others are closed. Save the text below to ~/.ssh/config and replace NETID with your Yale netid. Lines that begin with # will be ignored. # If you use a ssh key that is named something other than id_rsa, # you can specify your private key like this: # IdentityFile ~/.ssh/other_key_rsa # Uncomment the ForwardX11 options line to enable X11 Forwarding by default (no -Y necessary) # On a Mac you still need xquartz installed Host *.ycrc.yale.edu mccleary grace milgram User NETID #ForwardX11 yes # To re-use your connections with multi-factor authentication # Uncomment the two lines below #ControlMaster auto #ControlPath ~/.ssh/tmp/%h_%p_%r Host mccleary grace milgram HostName %h.ycrc.yale.edu Warning For multiplexing to work, the ~/.ssh/tmp directory must exist. Create it with mkdir -p ~/.ssh/tmp For more info on ssh configuration, run: man ssh_config Store Passphrase and Use SSH Agent on macOS By default, macOS won't always remember your ssh key passphrase and keep your ssh key in the agent for SSH agent forwarding. In order to not repeatedly enter your passphrase and instead store it in your keychain, enter the following command on your Mac (just once): ssh-add -K ~/.ssh/id_rsa Or whatever your private key file is named. Note If you use homebrew your default OpenSSH may have changed. To add your key(s) to the system ssh agent, use the absolute path: /usr/bin/ssh-add Then and add the following to your ~/.ssh/config file (create this file if it doesn't exist, or add these settings to the Host *.ycrc.yale.edu ... rule if it does). Host *.ycrc.yale.edu mccleary grace milgram UseKeychain yes AddKeystoAgent yes You can view a list of the keys currently in your agent with: ssh-add -L","title":"Advanced SSH Configuration"},{"location":"clusters-at-yale/access/advanced-config/#advanced-ssh-configuration","text":"","title":"Advanced SSH Configuration"},{"location":"clusters-at-yale/access/advanced-config/#example-ssh-config","text":"The following configuration is an example ssh client configuration file specific to our clusters. You can use it on Linux, Windows Subsystem for Linux (WSL) , and macOS. It allows you to use tab completion of the clusters, without the .ycrc.yale.edu suffixes (i.e. ssh grace or scp ~/my_file grace:my_file should work). It will also allow you to re-use and multiplex authenticated sessions. This means clusters that require Duo MFA will not force you to re-authenticate, as you use the same ssh connection to host multiple sessions. If you attempt to close your first connection with others running, it will wait until all others are closed. Save the text below to ~/.ssh/config and replace NETID with your Yale netid. Lines that begin with # will be ignored. # If you use a ssh key that is named something other than id_rsa, # you can specify your private key like this: # IdentityFile ~/.ssh/other_key_rsa # Uncomment the ForwardX11 options line to enable X11 Forwarding by default (no -Y necessary) # On a Mac you still need xquartz installed Host *.ycrc.yale.edu mccleary grace milgram User NETID #ForwardX11 yes # To re-use your connections with multi-factor authentication # Uncomment the two lines below #ControlMaster auto #ControlPath ~/.ssh/tmp/%h_%p_%r Host mccleary grace milgram HostName %h.ycrc.yale.edu Warning For multiplexing to work, the ~/.ssh/tmp directory must exist. Create it with mkdir -p ~/.ssh/tmp For more info on ssh configuration, run: man ssh_config","title":"Example SSH config"},{"location":"clusters-at-yale/access/advanced-config/#store-passphrase-and-use-ssh-agent-on-macos","text":"By default, macOS won't always remember your ssh key passphrase and keep your ssh key in the agent for SSH agent forwarding. In order to not repeatedly enter your passphrase and instead store it in your keychain, enter the following command on your Mac (just once): ssh-add -K ~/.ssh/id_rsa Or whatever your private key file is named. Note If you use homebrew your default OpenSSH may have changed. To add your key(s) to the system ssh agent, use the absolute path: /usr/bin/ssh-add Then and add the following to your ~/.ssh/config file (create this file if it doesn't exist, or add these settings to the Host *.ycrc.yale.edu ... rule if it does). Host *.ycrc.yale.edu mccleary grace milgram UseKeychain yes AddKeystoAgent yes You can view a list of the keys currently in your agent with: ssh-add -L","title":"Store Passphrase and Use SSH Agent on macOS"},{"location":"clusters-at-yale/access/courses/","text":"Courses The YCRC Grace and McCleary clusters can be made available for Yale courses with a suitable computational component. The YCRC hosts over a dozen courses on the clusters every semester. Warning All course allocations are temporary. All associated accounts and data will be removed one month after the last day of exams for that semester. For Instructors If you are interested in using a YCRC cluster in your Yale course, contact us at research.computing@yale.edu. If at all possible, please let us know of your interest in using a cluster at least two weeks prior to start of classes so we can plan accordingly, even if you have used the cluster in a previous semester. Course ID Your course will be give a specific courseid based on the Yale course catalog number. This courseid will be used in the course account names, web portal and, if applicable, node reservation. Course Accounts All members of a course, including the instructor and TFs will be give temporary course accounts. These accounts take the form of courseid_netid . Course accounts are district from any research accounts a course member may already have. Use this account if connecting to the cluster via ssh . All course-related accounts are subject to the same policies and expectation as standard accounts . Course Storage Courses on the YCRC clusters are typically granted a standard 1TiB project storage quota, as well as 125GiB home directory for each course member. If the course needs additional storage beyond the default 1TiB, please contact us at research.computing@yale.edu. See our cluster storage documentation for details about the different classifications of storage. Course-specific Web Portal Your course also has a course-specific web portal, based on Open OnDemand , accessible via the URL (replacing courseid with the id given to your course): courseid.ycrc.yale.edu Course members must use the course URL to log in to course accounts on Open OnDemand--the normal cluster portals are not accessible to course accounts. You will then authenticate using your standard NetID (without the courseid prefix) and password. As with all cluster access, you must be on the VPN to access the web portal if you are off campus. Node Reservations If the instructor has coordinated with the YCRC for dedicated nodes for the course, they are available via a \"reservation\". The nodes can be requested using the --reservation=courseid flag. See our Slurm documentation for more information on submitting jobs. In each of the following examples, replace courseid with the id given to your course. Jobs on course reservations are subject to the restrictions of their parent partition (e.g. 24 hour walltime limit on day or 2-day walltime limit on gpu ). If your jobs need to exceed those restrictions, please have your instructor or TF contact us. Course members are welcome to use the public partitions of the cluster. However, we request that students be respectful in their usage as not to disrupt ongoing research work. Interactive Jobs salloc -p day --reservation=courseid or if the reservation is for GPU nodes salloc -p gpu --gpus=1 --reservation=courseid Batch Jobs Add the following to your submission script: #SBATCH --reservation=courseid or if the reservation is for GPU nodes #SBATCH -p gpu --gpus=1 --reservation=courseid Web Portal In any of the app submission forms, type the courseid into the \"Reservation\" field. For standard (non-gpu) nodes, select day in the \"Partition\" field. If your node reservation contains GPU-enabled nodes, select gpu . Any course-specific apps listed under the \"Courses\" dropdown will automatically send all submitted jobs to the reservation, if one exists. Cluster Maintenance Each cluster is inaccessible twice a year for a three day regularly scheduled maintenance. The maintenance schedule is published here . Please account for the cluster unavailability when developing course schedules and (for students) completing your assignments. End of Semester Course Deletion As mentioned above, all course allocations are temporary. All associated accounts and data will be removed one month after the last day of exams for that semester. If you would like to retain any data in your course account, please download it prior to the deletion date or, if applicable, submit a request to hpc@yale.edu to transfer the data into your research account. A reminder of the removal will be sent to the instructor to see if it needs to be delayed for any incompletes (for example). Students will not received a reminder. Instructors, if you would like to retain course materials for future semesters, please copy them off the cluster or to a research account. Transfer Data to Research Account If you have a research account on the cluster, you can transfer any data you want to save from your course account to your research account. Warning Make sure there is sufficient free space in your research account storage to accomodate any data you are transferring from your course account using getquota . Login to the cluster using your course account either via Terminal or the Shell app in the OOD web portal. Grant your research account access to your course accounts directories (substitute in your courseid and netid in the example). # home directory setfacl -m u:netid:rX /home/courseid_netid # project directory on Grace and McCleary setfacl -m u:netid:rX /gpfs/gibbs/project/courseid/courseid_netid Log in as your research account. Check that you can access the above paths. Move to the transfer node with ssh transfer . If you are transferring a lot of data, open a tmux session so the transfer can continue if you disconnect from the cluster. Initiate a copy of the desired data using rsync . For example: mkdir /gpfs/gibbs/project/group/netid/my_course_data rsync -av /gpfs/gibbs/project/courseid/courseid_netid/mydata /gpfs/gibbs/project/group/netid/my_course_data","title":"Courses"},{"location":"clusters-at-yale/access/courses/#courses","text":"The YCRC Grace and McCleary clusters can be made available for Yale courses with a suitable computational component. The YCRC hosts over a dozen courses on the clusters every semester. Warning All course allocations are temporary. All associated accounts and data will be removed one month after the last day of exams for that semester. For Instructors If you are interested in using a YCRC cluster in your Yale course, contact us at research.computing@yale.edu. If at all possible, please let us know of your interest in using a cluster at least two weeks prior to start of classes so we can plan accordingly, even if you have used the cluster in a previous semester.","title":"Courses"},{"location":"clusters-at-yale/access/courses/#course-id","text":"Your course will be give a specific courseid based on the Yale course catalog number. This courseid will be used in the course account names, web portal and, if applicable, node reservation.","title":"Course ID"},{"location":"clusters-at-yale/access/courses/#course-accounts","text":"All members of a course, including the instructor and TFs will be give temporary course accounts. These accounts take the form of courseid_netid . Course accounts are district from any research accounts a course member may already have. Use this account if connecting to the cluster via ssh . All course-related accounts are subject to the same policies and expectation as standard accounts .","title":"Course Accounts"},{"location":"clusters-at-yale/access/courses/#course-storage","text":"Courses on the YCRC clusters are typically granted a standard 1TiB project storage quota, as well as 125GiB home directory for each course member. If the course needs additional storage beyond the default 1TiB, please contact us at research.computing@yale.edu. See our cluster storage documentation for details about the different classifications of storage.","title":"Course Storage"},{"location":"clusters-at-yale/access/courses/#course-specific-web-portal","text":"Your course also has a course-specific web portal, based on Open OnDemand , accessible via the URL (replacing courseid with the id given to your course): courseid.ycrc.yale.edu Course members must use the course URL to log in to course accounts on Open OnDemand--the normal cluster portals are not accessible to course accounts. You will then authenticate using your standard NetID (without the courseid prefix) and password. As with all cluster access, you must be on the VPN to access the web portal if you are off campus.","title":"Course-specific Web Portal"},{"location":"clusters-at-yale/access/courses/#node-reservations","text":"If the instructor has coordinated with the YCRC for dedicated nodes for the course, they are available via a \"reservation\". The nodes can be requested using the --reservation=courseid flag. See our Slurm documentation for more information on submitting jobs. In each of the following examples, replace courseid with the id given to your course. Jobs on course reservations are subject to the restrictions of their parent partition (e.g. 24 hour walltime limit on day or 2-day walltime limit on gpu ). If your jobs need to exceed those restrictions, please have your instructor or TF contact us. Course members are welcome to use the public partitions of the cluster. However, we request that students be respectful in their usage as not to disrupt ongoing research work.","title":"Node Reservations"},{"location":"clusters-at-yale/access/courses/#interactive-jobs","text":"salloc -p day --reservation=courseid or if the reservation is for GPU nodes salloc -p gpu --gpus=1 --reservation=courseid","title":"Interactive Jobs"},{"location":"clusters-at-yale/access/courses/#batch-jobs","text":"Add the following to your submission script: #SBATCH --reservation=courseid or if the reservation is for GPU nodes #SBATCH -p gpu --gpus=1 --reservation=courseid","title":"Batch Jobs"},{"location":"clusters-at-yale/access/courses/#web-portal","text":"In any of the app submission forms, type the courseid into the \"Reservation\" field. For standard (non-gpu) nodes, select day in the \"Partition\" field. If your node reservation contains GPU-enabled nodes, select gpu . Any course-specific apps listed under the \"Courses\" dropdown will automatically send all submitted jobs to the reservation, if one exists.","title":"Web Portal"},{"location":"clusters-at-yale/access/courses/#cluster-maintenance","text":"Each cluster is inaccessible twice a year for a three day regularly scheduled maintenance. The maintenance schedule is published here . Please account for the cluster unavailability when developing course schedules and (for students) completing your assignments.","title":"Cluster Maintenance"},{"location":"clusters-at-yale/access/courses/#end-of-semester-course-deletion","text":"As mentioned above, all course allocations are temporary. All associated accounts and data will be removed one month after the last day of exams for that semester. If you would like to retain any data in your course account, please download it prior to the deletion date or, if applicable, submit a request to hpc@yale.edu to transfer the data into your research account. A reminder of the removal will be sent to the instructor to see if it needs to be delayed for any incompletes (for example). Students will not received a reminder. Instructors, if you would like to retain course materials for future semesters, please copy them off the cluster or to a research account.","title":"End of Semester Course Deletion"},{"location":"clusters-at-yale/access/courses/#transfer-data-to-research-account","text":"If you have a research account on the cluster, you can transfer any data you want to save from your course account to your research account. Warning Make sure there is sufficient free space in your research account storage to accomodate any data you are transferring from your course account using getquota . Login to the cluster using your course account either via Terminal or the Shell app in the OOD web portal. Grant your research account access to your course accounts directories (substitute in your courseid and netid in the example). # home directory setfacl -m u:netid:rX /home/courseid_netid # project directory on Grace and McCleary setfacl -m u:netid:rX /gpfs/gibbs/project/courseid/courseid_netid Log in as your research account. Check that you can access the above paths. Move to the transfer node with ssh transfer . If you are transferring a lot of data, open a tmux session so the transfer can continue if you disconnect from the cluster. Initiate a copy of the desired data using rsync . For example: mkdir /gpfs/gibbs/project/group/netid/my_course_data rsync -av /gpfs/gibbs/project/courseid/courseid_netid/mydata /gpfs/gibbs/project/group/netid/my_course_data","title":"Transfer Data to Research Account"},{"location":"clusters-at-yale/access/mfa/","text":"Multi-factor Authentication To improve security, access to McCleary and Milgram requires both a public key and multi-factor authentication (MFA). We use the same MFA (Duo) as is used elsewhere at Yale. To get set up with Duo, see these instructions. You will need upload your ssh public key to our site . For more info on how to use ssh, please see the SSH instructions . Once you've set up Duo and your key is registered, you can log in to the cluster. Use ssh to connect to your cluster of choice, and you will be prompted for a passcode or to select a notification option. We recommend choosing Duo Push (option 1). If you chose this option you should receive a notification on your phone. Once approved, you should be allowed to continue to log in. Note You can set up more than one phone for Duo. For example, you can set up your smartphone plus your office landline. That way, if you forget or lose your phone, you can still authenticate. For instructions on how to add additional phones go here . Connection Multiplexing and File Transfers with DUO MFA Some file transfer clients attempt new and sometimes multiple concurrent connections to transfer files for you. When this happens, you will be asked to Duo authenticate for each connection. SSH Config File On macOS and Linux-based systems setting up a config file lets you re-uses your authenticated sessions for command-line tools and tools that respect your ssh configuration. An example config file is shown below which enables SSH multiplexing ( ControlMaster ) by caching connections in a directory ( ControlPath ) for a period of time (2h, ControlPersist ). Host *.ycrc.yale.edu mccleary grace milgram User NETID # Uncomment below to enable X11 forwarding without `-Y` #ForwardX11 yes # To re-use your connections with multi-factor authentication ControlMaster auto ControlPath ~/.ssh/tmp/%h_%p_%r ControlPersist 2h Host mccleary grace milgram HostName %h.ycrc.yale.edu Warning For multiplexing to work, the ~/.ssh/tmp directory must exist. Create it with mkdir -p ~/.ssh/tmp CyberDuck CyberDuck's interface with MFA can be stream-lined with a few additional configuration steps. Under Cyberduck > Preferences > Transfers > General change the setting to \"Use browser connection\" instead of \"Open multiple connections\". When you connect type one of the following when prompted with a \"Partial authentication success\" window. \"push\" to receive a push notification to your smart phone (requires the Duo mobile app) \"sms\" to receive a verification passcode via text message \"phone\" to receive a phone call MobaXTerm MobaXTerm is able to cache MFA connections to reduce the frequency of push notifications. Under Settings > SSH > Advanced SSH settings set the ssh browser type to scp (enhanced speed) as seen here: MobaXTerm SSH Settings WinSCP Similarly, WinSCP can reuse existing SSH connections to reduce the frequency of push notifications. Under Options > Preferences > Background (under Transfer) and: Set Maximal number of transfers at the same time: to 1 Check the Use multiple connections for single transfer box Click OK to save settings Troubleshoot MFA If you are having problems initially registering Duo, please contact the Yale ITS Helpdesk . If you have successfully used MFA connect to a cluster before, but cannot now, first please check the following: Test MFA using http://access.yale.edu Verify that your ssh client is using the correct login node Verify you are attempting to connect from a Yale machine or via the proper VPN If all of this is true, please contact us . Include the following information (and anything else you think is helpful): Your netid Have you ever successfully used ssh and Duo to connect to a cluster? How long have you been having problems? Where are you trying to connect from? (fully qualified hostname/IP, if possible) Are you using a VPN? What is the error message you see?","title":"Multi-factor Authentication"},{"location":"clusters-at-yale/access/mfa/#multi-factor-authentication","text":"To improve security, access to McCleary and Milgram requires both a public key and multi-factor authentication (MFA). We use the same MFA (Duo) as is used elsewhere at Yale. To get set up with Duo, see these instructions. You will need upload your ssh public key to our site . For more info on how to use ssh, please see the SSH instructions . Once you've set up Duo and your key is registered, you can log in to the cluster. Use ssh to connect to your cluster of choice, and you will be prompted for a passcode or to select a notification option. We recommend choosing Duo Push (option 1). If you chose this option you should receive a notification on your phone. Once approved, you should be allowed to continue to log in. Note You can set up more than one phone for Duo. For example, you can set up your smartphone plus your office landline. That way, if you forget or lose your phone, you can still authenticate. For instructions on how to add additional phones go here .","title":"Multi-factor Authentication"},{"location":"clusters-at-yale/access/mfa/#connection-multiplexing-and-file-transfers-with-duo-mfa","text":"Some file transfer clients attempt new and sometimes multiple concurrent connections to transfer files for you. When this happens, you will be asked to Duo authenticate for each connection.","title":"Connection Multiplexing and File Transfers with DUO MFA"},{"location":"clusters-at-yale/access/mfa/#ssh-config-file","text":"On macOS and Linux-based systems setting up a config file lets you re-uses your authenticated sessions for command-line tools and tools that respect your ssh configuration. An example config file is shown below which enables SSH multiplexing ( ControlMaster ) by caching connections in a directory ( ControlPath ) for a period of time (2h, ControlPersist ). Host *.ycrc.yale.edu mccleary grace milgram User NETID # Uncomment below to enable X11 forwarding without `-Y` #ForwardX11 yes # To re-use your connections with multi-factor authentication ControlMaster auto ControlPath ~/.ssh/tmp/%h_%p_%r ControlPersist 2h Host mccleary grace milgram HostName %h.ycrc.yale.edu Warning For multiplexing to work, the ~/.ssh/tmp directory must exist. Create it with mkdir -p ~/.ssh/tmp","title":"SSH Config File"},{"location":"clusters-at-yale/access/mfa/#cyberduck","text":"CyberDuck's interface with MFA can be stream-lined with a few additional configuration steps. Under Cyberduck > Preferences > Transfers > General change the setting to \"Use browser connection\" instead of \"Open multiple connections\". When you connect type one of the following when prompted with a \"Partial authentication success\" window. \"push\" to receive a push notification to your smart phone (requires the Duo mobile app) \"sms\" to receive a verification passcode via text message \"phone\" to receive a phone call","title":"CyberDuck"},{"location":"clusters-at-yale/access/mfa/#mobaxterm","text":"MobaXTerm is able to cache MFA connections to reduce the frequency of push notifications. Under Settings > SSH > Advanced SSH settings set the ssh browser type to scp (enhanced speed) as seen here: MobaXTerm SSH Settings","title":"MobaXTerm"},{"location":"clusters-at-yale/access/mfa/#winscp","text":"Similarly, WinSCP can reuse existing SSH connections to reduce the frequency of push notifications. Under Options > Preferences > Background (under Transfer) and: Set Maximal number of transfers at the same time: to 1 Check the Use multiple connections for single transfer box Click OK to save settings","title":"WinSCP"},{"location":"clusters-at-yale/access/mfa/#troubleshoot-mfa","text":"If you are having problems initially registering Duo, please contact the Yale ITS Helpdesk . If you have successfully used MFA connect to a cluster before, but cannot now, first please check the following: Test MFA using http://access.yale.edu Verify that your ssh client is using the correct login node Verify you are attempting to connect from a Yale machine or via the proper VPN If all of this is true, please contact us . Include the following information (and anything else you think is helpful): Your netid Have you ever successfully used ssh and Duo to connect to a cluster? How long have you been having problems? Where are you trying to connect from? (fully qualified hostname/IP, if possible) Are you using a VPN? What is the error message you see?","title":"Troubleshoot MFA"},{"location":"clusters-at-yale/access/ood/","text":"Web Portal (Open OnDemand) Open OnDemand (OOD) is platform for accessing the clusters that only requires a web browser. This web-portal provides a shell, file browser, and graphical interface for certain apps (like Jupyter or MATLAB). Access If you access Open OnDemand installed on YCRC clusters from off campus, you will need to first connect to the Yale VPN . Open OnDemand is available on each cluster using your NetID credentials (CAS login). The Yale CAS login is configured with the DUO authentication. We recommend that you click \"Remember me for 90 days\" when you are prompted to choose an authentication menthod for DUO. This will simplified the login process. Cluster OOD site Grace ood-grace.ycrc.yale.edu McCleary ood-mccleary.ycrc.yale.edu Milgram ood-milgram.ycrc.yale.edu The above four URLs are also called cluster OOD URLs. They are available to any user with a research account (also called a lab account) on the clusters. Your research account is the same as your NetID. OOD for Courses Each course on the YCRC clusters has its own URL to access OOD on the cluster. The URL is unique to each course and is also called course OOD. Course OODs all follow the same naming convention: coursename.ycrc.yale.edu . 'courename' is an abbreviated name given to the course by YCRC. Students must use the course URL to log in to OOD. They will with their NetID to log in but work under their student account on the cluster while they are in OOD. Course OOD and cluster OOD have different URLs, even if they use the same physical machine. Student accounts can only log in to OOD through a course OOD URL, and a regular account (same as your NetID) can only log in through the cluster OOD URL. Warning If you only have a student account, but try to log in through the cluster OOD URL, you will get an error in the browser: Error -- can't find user for cpsc424_test Run 'nginx_stage --help' to see a full list of available command line options. Use the URL for your course OOD will resolve the problem. Additional information about course OOD can be found at academic support . The Dashboard On login you will see the OOD dashboard. Along the top are pull-down menus for various Apps, including File Managers, Job Composer, a Shell, a list of Interactive Apps, etc. File Browser The file browser is a graphical interface to manage, upload, and download files from the clusters. You can use the built-in file editor to view and edit files from your browser without having to download and upload scripts. You can also drag-and-drop to download and upload files and directories, and move files between directories using this interface. Customize Favorite Paths Users are allowed to customize favorite paths in the file manager. Using the scripts below to add, remove, and list customized paths: ood_add_path ood_remove_path ood_list_path When you run ood_add_path from a shell command line, it will prompt you to add one path at a time, until you type 'n' to discontinue. All the paths added by you will be shown in the OOD pull-down menu for the file manager, as well as the left pane when the file manager is opened. ood_remove_path allows you to remove any of the paths added by you and ood_list_path will list all the paths added by you. After you have customized the path configuration from a shell, go to the OOD dashbaord and click Develop -> Restart Web Server on the top menu bar to make the change effective immediately. Shell You can launch a traditional command-line interface to the cluster using the Shell pull-down menu. This opens a terminal in a web-browser that you can use in the exact same way as when logging into the cluster via SSH. This is a convenient way to access the clusters when you don't have access to an ssh client or do not have your ssh keys. Interactive Apps We have deployed a selection of common graphical programs as Interactive Apps on Open OneDemand. Currently, we have apps for Remote Desktop, MATLAB, Mathematica, RStudio Desktop, RStudio Server, and Jupyter Notebook, etc. Warning You are limited to 4 interactive app instances (of any type) at one time. Additional instances will be rejected until you delete older open instances. Closing the window does not terminate the interactive app job. To terminate the job, click the \"Delete\" button in your \"My Interactive Apps\" page in the web portal. Remote Desktop Occasionally, it is helpful to use a graphical interface to explore data or run certain programs. In the past your options were to use VNC or X11 forwarding . These tools can be complex to setup or suffer from reduced performance. The Remote Desktop app from OOD simplifies the configuration of a VNC desktop session on a compute node. The MATLAB, Mathematica, and RStudio Desktop Apps are special versions of this app. To get started choose Remote Desktop (or another desktop app) from the Interactive Apps menu on the dashboard. Use the form to request resources and decide what partition your job should run on. Use devel ( interactive on Milgram) or your lab's partition. Once you launch the job, you will be presented with a notification that your job has been queued. Depending on the resources requested, you may need to wait for a bit. When the job starts you will see the option to launch the Remote Desktop: Note you can share a view only link for your session if you would like to share your screen. After you click on Launch Remote Desktop, a standard desktop interface will open in a new tab. Copy/Paste In some browsers, you may have to use a special text box to copy and paste from the Remote Desktop App. Click the arrow on the left side of your window for a menu, then click the clipboard icon to get access to your Remote Desktop's clipboard. Jupyter One of the most common uses of Open OnDemand is the Jupyter interface for Python and R. You can choose either Jupyter Notebook or Jupyter Lab. By default, this app will try to launch Jupyter Notebook, unless the Start JupyterLab checkbox is selected. Make sure that you chose the right Conda environment for you from the drop-down menu. If you have not yet set one up, follow our instructions on how to create a new one. After specifying the required resources (number of CPUs/GPUs, amount of RAM, etc.), you can submit the job. When it launches you can open the standard Jupyter interface where you can start working with notebooks. Root directory The Jupyter root directory is set to your Home when started. Project and Scratch can be accessed via their respective symlinks in Home. If you want to access a directory that cannot be acessed through your home directory, for example Gibbs, you need to create a symlink to that directory in your home directory. ycrc_default The ycrc_default conda environment will be automatically built when you select it for the first time from Jupyter. You can also build your own Jupyter and make it available to OOD: module load miniconda conda create -n env_name jupyter jupyter-lab ycrc_conda_env.sh update Once created, ycrc_default will not be updated by OOD automatically. It must be updated by the user manually. To update ycrc_default , run the following command from a shell command line: module load miniconda conda update -n ycrc_default jupyter jupyter-lab RStudio Server Change User R Package Path To change the default path where packages installed by the user are stored, you need to add the following line of code in your $HOME/.bashrc : export R_LIBS_USER = path_to_your_local_r_packages Configure the Graphic Device When you plot in a RStudio session, you may encounter the following error: Error in RStudioGD () : Shadow graphics device error: r error 4 ( R code execution error ) In addition: Warning message: In grDevices:::png ( \"/tmp/RtmpcRxRaB/4v3450e3627g4432fa27f516348657267.png\" , : unable to open connection to X11 display '' To fix the problem, you need to configure your RStudio session to use Cairo for plotting. You can do it in your code as follows: options ( bitmapType = 'cairo' ) Alternatively, you can put the above code in .Rprofile in your home directory and the option will be picked up automatically. Clean RStudio If RStudio becomes slow to respond or completely stops responding, please stop the RStudio session and then run the following script at a shell command line: clean_rstudio.sh This will remove any temporary files created by RStudio and allow it to start anew. Troubleshoot OOD An OOD session is started and then completed immediately Check if your quota is full Reset your .bashrc and .bash_profile to their original contents (you can backup the startup files before resetting them. Add the changes back one at a time to see if one or more of the changes would affect OOD from starting properly) Remove the default module collection file $HOME/.lmod.d/default.cluster-rhel8 (cluster is one of the following: grace, mccleary) or $HOME/.lmod.d/default.milgram-rhel7 for Milgram. Remote Desktop (or MATLAB, Mathematica, etc) cannot be started properly Make sure there is no initialization left by conda init in your .bashrc . Clean it with sed -i.bak -ne '/# >>> conda init/,/# <<< conda init/!p' ~/.bashrc Run dbus-launch and make sure you see the following output: [ pl543@grace1 ~ ] $ which dbus-launch /usr/bin/dbus-launch Jupyter cannot be started properly If you are trying to launch jupyter-notebook , make sure it is available in your jupyter conda environment: ( ycrc_default )[ pl543@grace1 ~ ] $ which jupyter-notebook /gpfs/gibbs/project/support/pl543/conda_envs/ycrc_default/bin/jupyter-notebook If you are trying to launch jupyter-lab , make sure it is available in your jupyter conda environment: ( ycrc_default )[ pl543@grace1 ~ ] $ which jupyter-lab /gpfs/gibbs/project/support/pl543/conda_envs/ycrc_default/bin/jupyter-notebook RStudio with Conda R If you see NOT_FOUND in \"Conda R Environment\", it means your Conda R environment has not been properly installed. You may need to reinstall your Conda R environment and make sure r-base r-essentials are both included. RStudio Server does not respond If you encounter a grey screen after clicking the \"Connect to RStudio Server\" button, please stop the RStudio session and run clean-rstudio.sh at a shell command line.","title":"Web Portal (Open OnDemand)"},{"location":"clusters-at-yale/access/ood/#web-portal-open-ondemand","text":"Open OnDemand (OOD) is platform for accessing the clusters that only requires a web browser. This web-portal provides a shell, file browser, and graphical interface for certain apps (like Jupyter or MATLAB).","title":"Web Portal (Open OnDemand)"},{"location":"clusters-at-yale/access/ood/#access","text":"If you access Open OnDemand installed on YCRC clusters from off campus, you will need to first connect to the Yale VPN . Open OnDemand is available on each cluster using your NetID credentials (CAS login). The Yale CAS login is configured with the DUO authentication. We recommend that you click \"Remember me for 90 days\" when you are prompted to choose an authentication menthod for DUO. This will simplified the login process. Cluster OOD site Grace ood-grace.ycrc.yale.edu McCleary ood-mccleary.ycrc.yale.edu Milgram ood-milgram.ycrc.yale.edu The above four URLs are also called cluster OOD URLs. They are available to any user with a research account (also called a lab account) on the clusters. Your research account is the same as your NetID.","title":"Access"},{"location":"clusters-at-yale/access/ood/#ood-for-courses","text":"Each course on the YCRC clusters has its own URL to access OOD on the cluster. The URL is unique to each course and is also called course OOD. Course OODs all follow the same naming convention: coursename.ycrc.yale.edu . 'courename' is an abbreviated name given to the course by YCRC. Students must use the course URL to log in to OOD. They will with their NetID to log in but work under their student account on the cluster while they are in OOD. Course OOD and cluster OOD have different URLs, even if they use the same physical machine. Student accounts can only log in to OOD through a course OOD URL, and a regular account (same as your NetID) can only log in through the cluster OOD URL. Warning If you only have a student account, but try to log in through the cluster OOD URL, you will get an error in the browser: Error -- can't find user for cpsc424_test Run 'nginx_stage --help' to see a full list of available command line options. Use the URL for your course OOD will resolve the problem. Additional information about course OOD can be found at academic support .","title":"OOD for Courses"},{"location":"clusters-at-yale/access/ood/#the-dashboard","text":"On login you will see the OOD dashboard. Along the top are pull-down menus for various Apps, including File Managers, Job Composer, a Shell, a list of Interactive Apps, etc.","title":"The Dashboard"},{"location":"clusters-at-yale/access/ood/#file-browser","text":"The file browser is a graphical interface to manage, upload, and download files from the clusters. You can use the built-in file editor to view and edit files from your browser without having to download and upload scripts. You can also drag-and-drop to download and upload files and directories, and move files between directories using this interface.","title":"File Browser"},{"location":"clusters-at-yale/access/ood/#customize-favorite-paths","text":"Users are allowed to customize favorite paths in the file manager. Using the scripts below to add, remove, and list customized paths: ood_add_path ood_remove_path ood_list_path When you run ood_add_path from a shell command line, it will prompt you to add one path at a time, until you type 'n' to discontinue. All the paths added by you will be shown in the OOD pull-down menu for the file manager, as well as the left pane when the file manager is opened. ood_remove_path allows you to remove any of the paths added by you and ood_list_path will list all the paths added by you. After you have customized the path configuration from a shell, go to the OOD dashbaord and click Develop -> Restart Web Server on the top menu bar to make the change effective immediately.","title":"Customize Favorite Paths"},{"location":"clusters-at-yale/access/ood/#shell","text":"You can launch a traditional command-line interface to the cluster using the Shell pull-down menu. This opens a terminal in a web-browser that you can use in the exact same way as when logging into the cluster via SSH. This is a convenient way to access the clusters when you don't have access to an ssh client or do not have your ssh keys.","title":"Shell"},{"location":"clusters-at-yale/access/ood/#interactive-apps","text":"We have deployed a selection of common graphical programs as Interactive Apps on Open OneDemand. Currently, we have apps for Remote Desktop, MATLAB, Mathematica, RStudio Desktop, RStudio Server, and Jupyter Notebook, etc. Warning You are limited to 4 interactive app instances (of any type) at one time. Additional instances will be rejected until you delete older open instances. Closing the window does not terminate the interactive app job. To terminate the job, click the \"Delete\" button in your \"My Interactive Apps\" page in the web portal.","title":"Interactive Apps"},{"location":"clusters-at-yale/access/ood/#remote-desktop","text":"Occasionally, it is helpful to use a graphical interface to explore data or run certain programs. In the past your options were to use VNC or X11 forwarding . These tools can be complex to setup or suffer from reduced performance. The Remote Desktop app from OOD simplifies the configuration of a VNC desktop session on a compute node. The MATLAB, Mathematica, and RStudio Desktop Apps are special versions of this app. To get started choose Remote Desktop (or another desktop app) from the Interactive Apps menu on the dashboard. Use the form to request resources and decide what partition your job should run on. Use devel ( interactive on Milgram) or your lab's partition. Once you launch the job, you will be presented with a notification that your job has been queued. Depending on the resources requested, you may need to wait for a bit. When the job starts you will see the option to launch the Remote Desktop: Note you can share a view only link for your session if you would like to share your screen. After you click on Launch Remote Desktop, a standard desktop interface will open in a new tab.","title":"Remote Desktop"},{"location":"clusters-at-yale/access/ood/#copypaste","text":"In some browsers, you may have to use a special text box to copy and paste from the Remote Desktop App. Click the arrow on the left side of your window for a menu, then click the clipboard icon to get access to your Remote Desktop's clipboard.","title":"Copy/Paste"},{"location":"clusters-at-yale/access/ood/#jupyter","text":"One of the most common uses of Open OnDemand is the Jupyter interface for Python and R. You can choose either Jupyter Notebook or Jupyter Lab. By default, this app will try to launch Jupyter Notebook, unless the Start JupyterLab checkbox is selected. Make sure that you chose the right Conda environment for you from the drop-down menu. If you have not yet set one up, follow our instructions on how to create a new one. After specifying the required resources (number of CPUs/GPUs, amount of RAM, etc.), you can submit the job. When it launches you can open the standard Jupyter interface where you can start working with notebooks.","title":"Jupyter"},{"location":"clusters-at-yale/access/ood/#root-directory","text":"The Jupyter root directory is set to your Home when started. Project and Scratch can be accessed via their respective symlinks in Home. If you want to access a directory that cannot be acessed through your home directory, for example Gibbs, you need to create a symlink to that directory in your home directory.","title":"Root directory"},{"location":"clusters-at-yale/access/ood/#ycrc_default","text":"The ycrc_default conda environment will be automatically built when you select it for the first time from Jupyter. You can also build your own Jupyter and make it available to OOD: module load miniconda conda create -n env_name jupyter jupyter-lab ycrc_conda_env.sh update Once created, ycrc_default will not be updated by OOD automatically. It must be updated by the user manually. To update ycrc_default , run the following command from a shell command line: module load miniconda conda update -n ycrc_default jupyter jupyter-lab","title":"ycrc_default"},{"location":"clusters-at-yale/access/ood/#rstudio-server","text":"","title":"RStudio Server"},{"location":"clusters-at-yale/access/ood/#change-user-r-package-path","text":"To change the default path where packages installed by the user are stored, you need to add the following line of code in your $HOME/.bashrc : export R_LIBS_USER = path_to_your_local_r_packages","title":"Change User R Package Path"},{"location":"clusters-at-yale/access/ood/#configure-the-graphic-device","text":"When you plot in a RStudio session, you may encounter the following error: Error in RStudioGD () : Shadow graphics device error: r error 4 ( R code execution error ) In addition: Warning message: In grDevices:::png ( \"/tmp/RtmpcRxRaB/4v3450e3627g4432fa27f516348657267.png\" , : unable to open connection to X11 display '' To fix the problem, you need to configure your RStudio session to use Cairo for plotting. You can do it in your code as follows: options ( bitmapType = 'cairo' ) Alternatively, you can put the above code in .Rprofile in your home directory and the option will be picked up automatically.","title":"Configure the Graphic Device"},{"location":"clusters-at-yale/access/ood/#clean-rstudio","text":"If RStudio becomes slow to respond or completely stops responding, please stop the RStudio session and then run the following script at a shell command line: clean_rstudio.sh This will remove any temporary files created by RStudio and allow it to start anew.","title":"Clean RStudio"},{"location":"clusters-at-yale/access/ood/#troubleshoot-ood","text":"","title":"Troubleshoot OOD"},{"location":"clusters-at-yale/access/ood/#an-ood-session-is-started-and-then-completed-immediately","text":"Check if your quota is full Reset your .bashrc and .bash_profile to their original contents (you can backup the startup files before resetting them. Add the changes back one at a time to see if one or more of the changes would affect OOD from starting properly) Remove the default module collection file $HOME/.lmod.d/default.cluster-rhel8 (cluster is one of the following: grace, mccleary) or $HOME/.lmod.d/default.milgram-rhel7 for Milgram.","title":"An OOD session is started and then completed immediately"},{"location":"clusters-at-yale/access/ood/#remote-desktop-or-matlab-mathematica-etc-cannot-be-started-properly","text":"Make sure there is no initialization left by conda init in your .bashrc . Clean it with sed -i.bak -ne '/# >>> conda init/,/# <<< conda init/!p' ~/.bashrc Run dbus-launch and make sure you see the following output: [ pl543@grace1 ~ ] $ which dbus-launch /usr/bin/dbus-launch","title":"Remote Desktop (or MATLAB, Mathematica, etc) cannot be started properly"},{"location":"clusters-at-yale/access/ood/#jupyter-cannot-be-started-properly","text":"If you are trying to launch jupyter-notebook , make sure it is available in your jupyter conda environment: ( ycrc_default )[ pl543@grace1 ~ ] $ which jupyter-notebook /gpfs/gibbs/project/support/pl543/conda_envs/ycrc_default/bin/jupyter-notebook If you are trying to launch jupyter-lab , make sure it is available in your jupyter conda environment: ( ycrc_default )[ pl543@grace1 ~ ] $ which jupyter-lab /gpfs/gibbs/project/support/pl543/conda_envs/ycrc_default/bin/jupyter-notebook","title":"Jupyter cannot be started properly"},{"location":"clusters-at-yale/access/ood/#rstudio-with-conda-r","text":"If you see NOT_FOUND in \"Conda R Environment\", it means your Conda R environment has not been properly installed. You may need to reinstall your Conda R environment and make sure r-base r-essentials are both included.","title":"RStudio with Conda R"},{"location":"clusters-at-yale/access/ood/#rstudio-server-does-not-respond","text":"If you encounter a grey screen after clicking the \"Connect to RStudio Server\" button, please stop the RStudio session and run clean-rstudio.sh at a shell command line.","title":"RStudio Server does not respond"},{"location":"clusters-at-yale/access/ssh/","text":"SSH Connection For more advanced use cases that are not well supported by the Web Portal (Open OnDemand) , you can connect to the cluster over the more traditional SSH connection. Overview Request an account (if you do not already have one). Send us your public SSH key with our SSH key uploader . Allow up to ten minutes for it to propagate. Once we have your public key you can connect with ssh netid@clustername.ycrc.yale.edu . Login node addresses and other details of the clusters, such as scheduler partitions and storage, can be found on the clusters page . To use graphical programs on the clusters, please see our guides on Open OnDemand or X11 Forwarding . If you are having trouble logging in : please read the rest of this page and our Troubleshoot Login page, then contact us if you're still having issues. What are SSH keys SSH (Secure Shell) keys are a set of two pieces of information that you use to identify yourself and encrypt communication to and from a server. Usually this takes the form of two files: a public key (often saved as id_rsa.pub ) and a private key ( id_rsa or id_rsa.ppk ). To use an analogy, your public key is like a lock and your private key is what unlocks it. It is ok for others to see the lock (public key), but anyone who knows the private key can open your lock (and impersonate you). When you connect to a remote server in order to sign in, it will present your lock. You prove your identity by unlocking it with your secret key. As you continue communicating with the remote server, the data sent to you is also locked with your public key such that only you can unlock it with your private key. We use an automated system to distribute your public key onto the clusters, which you can log in to here . It is only accessible on campus or through the Yale VPN . All the public keys that are authorized to your account are stored in the file ~/.ssh/authorized_keys on the clusters you have been given access to. If you use multiple computers, you can either keep the same ssh key pair on every one or have a different set for each. Having only one is less complicated, but if your key pair is compromised you have to be worried about everywhere it is authorized. Warning Keep your private keys private! Anyone who has them can assume your identity on any server where your keys are authorized. We will never ask for your private key . For further reading we recommend starting with the Wikipedia articles about public-key cryptography and challenge-response authentication . macOS and Linux Generate Your Key Pair on macOS and Linux To generate a new key pair, first open a terminal/xterm session. If you are on macOS, open Applications -> Utilities -> Terminal. Generate your public and private ssh keys. Type the following into the terminal window: ssh-keygen Your terminal should respond: Generating public/private rsa key pair. Enter file in which to save the key (/home/yourusername/.ssh/id_rsa): Press Enter to accept the default value. Your terminal should respond: Enter passphrase (empty for no passphrase): Choose a secure passphrase. Your passphrase will prevent access to your account in the event your private key is stolen. You will not see any characters appear on the screen as you type. The response will be: Enter same passphrase again: Enter the passphrase again. The key pair is generated and written to a directory called .ssh in your home directory. The public key is stored in ~/.ssh/id_rsa.pub . If you forget your passphrase, it cannot be recovered. Instead, you will need to generate and upload a new SSH key pair. Next, upload your public SSH key on the cluster. Run the following command in a terminal: cat ~/.ssh/id_rsa.pub Copy and paste the output to our SSH key uploader . Note: It can take a few minutes for newly uploaded keys to sync out to the clusters so your login may not work immediately. Connect on macOS and Linux Once your key has been copied to the appropriate places on the clusters, you can log in with the command: ssh netid@clustername.ycrc.yale.edu Check out our Advanced SSH Configuration for tips on maintaining connections and adding tab complete to your ssh commands on linux/macOS. Windows We recommend using the Web Portal (Open OnDemand) to connect to the clusters from Windows. If you need advanced features beyond the web portal, we recommend using MobaXterm . MobaXterm You can download, extract & install MobaXterm from this page . We recommend using the \"Installer Edition\", but make sure to extract the zip file before running the installer. You can also use one of the Windows Subsystem for Linux (WSL) distributions and follow the Linux instructions above. However, you will probably run into issues if you try to use any graphical applications. Generate Your Key Pair on Windows First, generate an SSH key pair if you haven't already: Open MobaXterm. From the top menu choose Tools -> MobaKeyGen (SSH key generator). Leave all defaults and click the \"Generate\" button. Wiggle your mouse. Click \"Save public key\" and save your public key as id_rsa.pub. Choose a secure passphrase and enter into the two relevant fields. Your passphrase will prevent access to your account in the event your private key is stolen. Click \"Save private key\" and save your private key as id_rsa.ppk (this one is secret, don't give it to other people ). Copy the text of your public key and paste it into the text box in our SSH key uploader . Your key will be synced out to the clusters in a few minutes. Connect with MobaXterm To make a new connection to one of the clusters: Open MobaXterm. From the top menu select Sessions -> New Session. Click the SSH icon in the top left. Enter the cluster login node address (e.g. grace.ycrc.yale.edu) as the Remote Host. Check \"Specify Username\" and Enter your netID as the the username. Click the \"Advanced SSH Settings\" tab and check the \"Use private key box\", then click the file icon / magnifying glass to choose where you saved your private key (id_rsa.ppk). Click OK. In the future, your session should be saved in the sessions bar on the left in the main window.","title":"Connect with SSH"},{"location":"clusters-at-yale/access/ssh/#ssh-connection","text":"For more advanced use cases that are not well supported by the Web Portal (Open OnDemand) , you can connect to the cluster over the more traditional SSH connection.","title":"SSH Connection"},{"location":"clusters-at-yale/access/ssh/#overview","text":"Request an account (if you do not already have one). Send us your public SSH key with our SSH key uploader . Allow up to ten minutes for it to propagate. Once we have your public key you can connect with ssh netid@clustername.ycrc.yale.edu . Login node addresses and other details of the clusters, such as scheduler partitions and storage, can be found on the clusters page . To use graphical programs on the clusters, please see our guides on Open OnDemand or X11 Forwarding . If you are having trouble logging in : please read the rest of this page and our Troubleshoot Login page, then contact us if you're still having issues.","title":"Overview"},{"location":"clusters-at-yale/access/ssh/#what-are-ssh-keys","text":"SSH (Secure Shell) keys are a set of two pieces of information that you use to identify yourself and encrypt communication to and from a server. Usually this takes the form of two files: a public key (often saved as id_rsa.pub ) and a private key ( id_rsa or id_rsa.ppk ). To use an analogy, your public key is like a lock and your private key is what unlocks it. It is ok for others to see the lock (public key), but anyone who knows the private key can open your lock (and impersonate you). When you connect to a remote server in order to sign in, it will present your lock. You prove your identity by unlocking it with your secret key. As you continue communicating with the remote server, the data sent to you is also locked with your public key such that only you can unlock it with your private key. We use an automated system to distribute your public key onto the clusters, which you can log in to here . It is only accessible on campus or through the Yale VPN . All the public keys that are authorized to your account are stored in the file ~/.ssh/authorized_keys on the clusters you have been given access to. If you use multiple computers, you can either keep the same ssh key pair on every one or have a different set for each. Having only one is less complicated, but if your key pair is compromised you have to be worried about everywhere it is authorized. Warning Keep your private keys private! Anyone who has them can assume your identity on any server where your keys are authorized. We will never ask for your private key . For further reading we recommend starting with the Wikipedia articles about public-key cryptography and challenge-response authentication .","title":"What are SSH keys"},{"location":"clusters-at-yale/access/ssh/#macos-and-linux","text":"","title":"macOS and Linux"},{"location":"clusters-at-yale/access/ssh/#generate-your-key-pair-on-macos-and-linux","text":"To generate a new key pair, first open a terminal/xterm session. If you are on macOS, open Applications -> Utilities -> Terminal. Generate your public and private ssh keys. Type the following into the terminal window: ssh-keygen Your terminal should respond: Generating public/private rsa key pair. Enter file in which to save the key (/home/yourusername/.ssh/id_rsa): Press Enter to accept the default value. Your terminal should respond: Enter passphrase (empty for no passphrase): Choose a secure passphrase. Your passphrase will prevent access to your account in the event your private key is stolen. You will not see any characters appear on the screen as you type. The response will be: Enter same passphrase again: Enter the passphrase again. The key pair is generated and written to a directory called .ssh in your home directory. The public key is stored in ~/.ssh/id_rsa.pub . If you forget your passphrase, it cannot be recovered. Instead, you will need to generate and upload a new SSH key pair. Next, upload your public SSH key on the cluster. Run the following command in a terminal: cat ~/.ssh/id_rsa.pub Copy and paste the output to our SSH key uploader . Note: It can take a few minutes for newly uploaded keys to sync out to the clusters so your login may not work immediately.","title":"Generate Your Key Pair on macOS and Linux"},{"location":"clusters-at-yale/access/ssh/#connect-on-macos-and-linux","text":"Once your key has been copied to the appropriate places on the clusters, you can log in with the command: ssh netid@clustername.ycrc.yale.edu Check out our Advanced SSH Configuration for tips on maintaining connections and adding tab complete to your ssh commands on linux/macOS.","title":"Connect on macOS and Linux"},{"location":"clusters-at-yale/access/ssh/#windows","text":"We recommend using the Web Portal (Open OnDemand) to connect to the clusters from Windows. If you need advanced features beyond the web portal, we recommend using MobaXterm .","title":"Windows"},{"location":"clusters-at-yale/access/ssh/#mobaxterm","text":"You can download, extract & install MobaXterm from this page . We recommend using the \"Installer Edition\", but make sure to extract the zip file before running the installer. You can also use one of the Windows Subsystem for Linux (WSL) distributions and follow the Linux instructions above. However, you will probably run into issues if you try to use any graphical applications.","title":"MobaXterm"},{"location":"clusters-at-yale/access/ssh/#generate-your-key-pair-on-windows","text":"First, generate an SSH key pair if you haven't already: Open MobaXterm. From the top menu choose Tools -> MobaKeyGen (SSH key generator). Leave all defaults and click the \"Generate\" button. Wiggle your mouse. Click \"Save public key\" and save your public key as id_rsa.pub. Choose a secure passphrase and enter into the two relevant fields. Your passphrase will prevent access to your account in the event your private key is stolen. Click \"Save private key\" and save your private key as id_rsa.ppk (this one is secret, don't give it to other people ). Copy the text of your public key and paste it into the text box in our SSH key uploader . Your key will be synced out to the clusters in a few minutes.","title":"Generate Your Key Pair on Windows"},{"location":"clusters-at-yale/access/ssh/#connect-with-mobaxterm","text":"To make a new connection to one of the clusters: Open MobaXterm. From the top menu select Sessions -> New Session. Click the SSH icon in the top left. Enter the cluster login node address (e.g. grace.ycrc.yale.edu) as the Remote Host. Check \"Specify Username\" and Enter your netID as the the username. Click the \"Advanced SSH Settings\" tab and check the \"Use private key box\", then click the file icon / magnifying glass to choose where you saved your private key (id_rsa.ppk). Click OK. In the future, your session should be saved in the sessions bar on the left in the main window.","title":"Connect with MobaXterm"},{"location":"clusters-at-yale/access/vnc/","text":"VNC As an alternative to X11 Forwarding, using VNC to access the cluster is another way to run graphically intensive applications. Open OnDemand On the clusters, we have web dashboards set up that can run VNC for you as a job and forward your session back to you via your browser using Open OnDemand . To use the Remote Desktop tab, browse under the \"interactive apps\" drop-down menu item. We strongly encourage using Open OnDemand unless you have specific requirements otherwise. Setup vncserver on a Cluster Connect to the cluster with X11 forwarding enabled. If on Linux or Mac, ssh -Y netid@cluster , or if on Windows, follow our X11 forwarding guide . Start an interactive job on cluster with the --x11 flag (see Slurm for more information). For this description, we\u2019ll assume you were given node r801u30n01: salloc --x11 On that node, run the VNCserver. You\u2019ll see something like: r801u30n01.grace$ vncserver New 'r801u30n01.grace.ycrc.yale.edu:1 (kln26)' desktop is r801u30n01.grace.ycrc.yale.edu:1 Creating default startup script /home/kln26/.vnc/xstartup Starting applications specified in /home/kln26/.vnc/xstartup Log file is /home/kln26/.vnc/r801u30n01.grace.ycrc.yale.edu:1.log The :1 means that your DISPLAY is :1. You\u2019ll need that later, so note it. The first time you run \"vncserver\", you\u2019ll also be asked to select a password for allowing access. On MacOS, if connecting with TurboVNC throws a security exception such as \"javax.net.ssl.SSLHandshakeException\", try adding the SecurityTypes option when starting vncserver on the cluster: vncserver -SecurityTypes VNC,OTP,UnixLogin,None Connect from your local machine (laptop/desktop) macOs/Linux From a shell on your local machine, run the following ssh command: ssh -Y -L7777:r801u30n01:5901 YourNetID@cluster_login_node This will set up a tunnel from your local port 7777 to port 5901 on r801u30n01. You will need to customize this command to your situation. The 5901 is for display :1. In general, you should put 5900+DISPLAY. The 7777 is arbitrary; any number above 3000 will likely work. You\u2019ll need the number you chose for the next step. On your local machine, start the vncviewer application. Depending on your local operating system, you may need to install this. We recommend TurboVNC for Mac. When you start the viewer, you\u2019ll need to tell it which host and port to attach to. You want to specify the local end of the tunnel. In the above case, that would be localhost::7777. Exactly how you specify this will depend a bit on which viewer you use. E.g: vncviewer localhost::7777 You should be prompted for the password you set when you started the server. Now you are in a GUI environment and can run IGV or any other rich GUI application. /home/bioinfo/software/IGV/IGV_2.2.0/igv.sh Windows In MobaXterm, create a new Session (available in the menu bar) and then select the VNC session. To fill out the VNC Session setup, click the \"Network settings\" tab and check the box for \"Connect through SSH gateway (jump host). Then fill out the boxes as follows: Remote hostname or IP Address: name of the node running your VNC server (e.g. r801u30n01) Port: 5900 + the DISPLAY number from above (e.g. 5901 for DISPLAY = 1 ) Gateway SSH server: ssh address of the cluster (e.g. grace.ycrc.yale.edu) Port: 22 (should be default) User: netid Use private key: check this box and click to point to your private key file you use to connect to the cluster When you are done, click OK. If promoted for a password for \"localhost\", provide the vncserver password you specified in the previous step. If the VNC server looks very pixelated and your mouse movements seem laggy, try clicking the \"Toggle scaling\" button at the top of the VNC window. Example Configuration: Clean Up When you are all finished, you can kill the vncserver by doing this in the same shell you used to start it (replace :1 by your display number): vncserver -kill :1","title":"VNC"},{"location":"clusters-at-yale/access/vnc/#vnc","text":"As an alternative to X11 Forwarding, using VNC to access the cluster is another way to run graphically intensive applications.","title":"VNC"},{"location":"clusters-at-yale/access/vnc/#open-ondemand","text":"On the clusters, we have web dashboards set up that can run VNC for you as a job and forward your session back to you via your browser using Open OnDemand . To use the Remote Desktop tab, browse under the \"interactive apps\" drop-down menu item. We strongly encourage using Open OnDemand unless you have specific requirements otherwise.","title":"Open OnDemand"},{"location":"clusters-at-yale/access/vnc/#setup-vncserver-on-a-cluster","text":"Connect to the cluster with X11 forwarding enabled. If on Linux or Mac, ssh -Y netid@cluster , or if on Windows, follow our X11 forwarding guide . Start an interactive job on cluster with the --x11 flag (see Slurm for more information). For this description, we\u2019ll assume you were given node r801u30n01: salloc --x11 On that node, run the VNCserver. You\u2019ll see something like: r801u30n01.grace$ vncserver New 'r801u30n01.grace.ycrc.yale.edu:1 (kln26)' desktop is r801u30n01.grace.ycrc.yale.edu:1 Creating default startup script /home/kln26/.vnc/xstartup Starting applications specified in /home/kln26/.vnc/xstartup Log file is /home/kln26/.vnc/r801u30n01.grace.ycrc.yale.edu:1.log The :1 means that your DISPLAY is :1. You\u2019ll need that later, so note it. The first time you run \"vncserver\", you\u2019ll also be asked to select a password for allowing access. On MacOS, if connecting with TurboVNC throws a security exception such as \"javax.net.ssl.SSLHandshakeException\", try adding the SecurityTypes option when starting vncserver on the cluster: vncserver -SecurityTypes VNC,OTP,UnixLogin,None","title":"Setup vncserver on a Cluster"},{"location":"clusters-at-yale/access/vnc/#connect-from-your-local-machine-laptopdesktop","text":"","title":"Connect from your local machine (laptop/desktop)"},{"location":"clusters-at-yale/access/vnc/#macoslinux","text":"From a shell on your local machine, run the following ssh command: ssh -Y -L7777:r801u30n01:5901 YourNetID@cluster_login_node This will set up a tunnel from your local port 7777 to port 5901 on r801u30n01. You will need to customize this command to your situation. The 5901 is for display :1. In general, you should put 5900+DISPLAY. The 7777 is arbitrary; any number above 3000 will likely work. You\u2019ll need the number you chose for the next step. On your local machine, start the vncviewer application. Depending on your local operating system, you may need to install this. We recommend TurboVNC for Mac. When you start the viewer, you\u2019ll need to tell it which host and port to attach to. You want to specify the local end of the tunnel. In the above case, that would be localhost::7777. Exactly how you specify this will depend a bit on which viewer you use. E.g: vncviewer localhost::7777 You should be prompted for the password you set when you started the server. Now you are in a GUI environment and can run IGV or any other rich GUI application. /home/bioinfo/software/IGV/IGV_2.2.0/igv.sh","title":"macOs/Linux"},{"location":"clusters-at-yale/access/vnc/#windows","text":"In MobaXterm, create a new Session (available in the menu bar) and then select the VNC session. To fill out the VNC Session setup, click the \"Network settings\" tab and check the box for \"Connect through SSH gateway (jump host). Then fill out the boxes as follows: Remote hostname or IP Address: name of the node running your VNC server (e.g. r801u30n01) Port: 5900 + the DISPLAY number from above (e.g. 5901 for DISPLAY = 1 ) Gateway SSH server: ssh address of the cluster (e.g. grace.ycrc.yale.edu) Port: 22 (should be default) User: netid Use private key: check this box and click to point to your private key file you use to connect to the cluster When you are done, click OK. If promoted for a password for \"localhost\", provide the vncserver password you specified in the previous step. If the VNC server looks very pixelated and your mouse movements seem laggy, try clicking the \"Toggle scaling\" button at the top of the VNC window. Example Configuration:","title":"Windows"},{"location":"clusters-at-yale/access/vnc/#clean-up","text":"When you are all finished, you can kill the vncserver by doing this in the same shell you used to start it (replace :1 by your display number): vncserver -kill :1","title":"Clean Up"},{"location":"clusters-at-yale/access/vpn/","text":"Access from Off Campus (VPN) Yale's clusters can only be accessed on the Yale network. Therefore, in order to access a cluster from off campus, you will need to first connect to Yale's VPN. More information about Yale's VPN can be found on the ITS website . VPN Software Windows and macOS We recommend the Cisco AnyConnect VPN Client, which can be downloaded from the ITS Software Library . Linux On Linux, you can use openconnect to connect to one of Yale's VPNs. If you are using the standard Gnome-based distros, use the commands below to install. Ubuntu/Debian sudo apt install network-manager-openconnect-gnome Fedora/CentOS sudo yum install NetworkManager-openconnect Connect via VPN You will need to connect via the VPN client using the profile \"access.yale.edu\". Multi-factor Authentication (MFA) Authentication for the VPN requires multi-factor authentication via Duo in addition to your normal Yale credentials (email address and netid password). After you select \"Connect\" in the above dialog box, it will launch a web page with a prompt to login with your email address, netid password and MFA method. You can click \"Other options\" to choose your authentication method. If you choose \"Duo Push\", simply tap \"Approve\" on your mobile device. If you choose \"Duo Mobile passcode\", enter the passcode from the Duo Mobile app. If you choose \"Phone call\", follow the prompts when you are called. Once you successfully authenticate with MFA, you will be connected to the VPN and should be able to log in the clusters via SSH and Open OnDemand as usual. More information about MFA at Yale can be found on the ITS website .","title":"Access from Off Campus (VPN)"},{"location":"clusters-at-yale/access/vpn/#access-from-off-campus-vpn","text":"Yale's clusters can only be accessed on the Yale network. Therefore, in order to access a cluster from off campus, you will need to first connect to Yale's VPN. More information about Yale's VPN can be found on the ITS website .","title":"Access from Off Campus (VPN)"},{"location":"clusters-at-yale/access/vpn/#vpn-software","text":"","title":"VPN Software"},{"location":"clusters-at-yale/access/vpn/#windows-and-macos","text":"We recommend the Cisco AnyConnect VPN Client, which can be downloaded from the ITS Software Library .","title":"Windows and macOS"},{"location":"clusters-at-yale/access/vpn/#linux","text":"On Linux, you can use openconnect to connect to one of Yale's VPNs. If you are using the standard Gnome-based distros, use the commands below to install. Ubuntu/Debian sudo apt install network-manager-openconnect-gnome Fedora/CentOS sudo yum install NetworkManager-openconnect","title":"Linux"},{"location":"clusters-at-yale/access/vpn/#connect-via-vpn","text":"You will need to connect via the VPN client using the profile \"access.yale.edu\".","title":"Connect via VPN"},{"location":"clusters-at-yale/access/vpn/#multi-factor-authentication-mfa","text":"Authentication for the VPN requires multi-factor authentication via Duo in addition to your normal Yale credentials (email address and netid password). After you select \"Connect\" in the above dialog box, it will launch a web page with a prompt to login with your email address, netid password and MFA method. You can click \"Other options\" to choose your authentication method. If you choose \"Duo Push\", simply tap \"Approve\" on your mobile device. If you choose \"Duo Mobile passcode\", enter the passcode from the Duo Mobile app. If you choose \"Phone call\", follow the prompts when you are called. Once you successfully authenticate with MFA, you will be connected to the VPN and should be able to log in the clusters via SSH and Open OnDemand as usual. More information about MFA at Yale can be found on the ITS website .","title":"Multi-factor Authentication (MFA)"},{"location":"clusters-at-yale/access/x11/","text":"Graphical Interfaces (X11) To use a graphical interface on the clusters and you choose not to use the web portal , your connection needs to be set up for X11 forwarding, which will transmit the graphical window from the cluster back to your local machine. A simple test to see if your setup is working is to run the command xclock . You should see a simple analog clock window pop up. On macOS Download and install the latest X-Quartz release. Log out and log back in to your Mac to reset some variables When using ssh to log in to the clusters, use the -Y option to enable X11 forwarding. Example: ssh -Y netid@grace.ycrc.yale.edu Note: if you get the error \"cannot open display\", please open an X-Quartz terminal and run the following command, and then log in to the cluster from the X-Quartz terminal: launchctl load -w /Library/LaunchAgents/org.macosforge.xquartz.startx.plist On Windows We recommend MobaXterm for connecting to the clusters from Windows. It is configured for X11 forwarding out of the box and should require no additional configuration or software. Quick Test A quick and simple test to check if X11 forwarding is working is to run the command xclock in the session you expect to be forwarding. After a short delay, you should see a window with a simple clock pop up. Submit an X11 enabled Job Once configured, you'll usually want to use X11 forwarding on a compute node to do your work. To allocate a simple interactive session with X11 forwarding: salloc --x11 For more Slurm options, see our Slurm documentation .","title":"Graphical Interfaces (X11)"},{"location":"clusters-at-yale/access/x11/#graphical-interfaces-x11","text":"To use a graphical interface on the clusters and you choose not to use the web portal , your connection needs to be set up for X11 forwarding, which will transmit the graphical window from the cluster back to your local machine. A simple test to see if your setup is working is to run the command xclock . You should see a simple analog clock window pop up.","title":"Graphical Interfaces (X11)"},{"location":"clusters-at-yale/access/x11/#on-macos","text":"Download and install the latest X-Quartz release. Log out and log back in to your Mac to reset some variables When using ssh to log in to the clusters, use the -Y option to enable X11 forwarding. Example: ssh -Y netid@grace.ycrc.yale.edu Note: if you get the error \"cannot open display\", please open an X-Quartz terminal and run the following command, and then log in to the cluster from the X-Quartz terminal: launchctl load -w /Library/LaunchAgents/org.macosforge.xquartz.startx.plist","title":"On macOS"},{"location":"clusters-at-yale/access/x11/#on-windows","text":"We recommend MobaXterm for connecting to the clusters from Windows. It is configured for X11 forwarding out of the box and should require no additional configuration or software.","title":"On Windows"},{"location":"clusters-at-yale/access/x11/#quick-test","text":"A quick and simple test to check if X11 forwarding is working is to run the command xclock in the session you expect to be forwarding. After a short delay, you should see a window with a simple clock pop up.","title":"Quick Test"},{"location":"clusters-at-yale/access/x11/#submit-an-x11-enabled-job","text":"Once configured, you'll usually want to use X11 forwarding on a compute node to do your work. To allocate a simple interactive session with X11 forwarding: salloc --x11 For more Slurm options, see our Slurm documentation .","title":"Submit an X11 enabled Job"},{"location":"clusters-at-yale/applications/","text":"Overview Software Modules The YCRC will install and manage commonly used software. These software are available as modules, which allow you to add or remove different combinations and versions of software to your environment as needed. See our module guide for more info. You can run module avail to page through all available software once you log in. Conda, Python & R You should also feel free to install things for yourself. See our Conda , Python , R guides for guidance on running these on the clusters. Compile Your Own Software For all other software, we encourage users to attempt to install their own software into their directories. Here are instructions for common software procedures. Make Cmake Apptainer : create containers and port Docker containers to the clusters (formerly know as \"Singularity\") If you run into issues with your software installations, contact us . Software Guides We provide additional guides for running specific software on the clusters as well.","title":"Overview"},{"location":"clusters-at-yale/applications/#overview","text":"","title":"Overview"},{"location":"clusters-at-yale/applications/#software-modules","text":"The YCRC will install and manage commonly used software. These software are available as modules, which allow you to add or remove different combinations and versions of software to your environment as needed. See our module guide for more info. You can run module avail to page through all available software once you log in.","title":"Software Modules"},{"location":"clusters-at-yale/applications/#conda-python-r","text":"You should also feel free to install things for yourself. See our Conda , Python , R guides for guidance on running these on the clusters.","title":"Conda, Python & R"},{"location":"clusters-at-yale/applications/#compile-your-own-software","text":"For all other software, we encourage users to attempt to install their own software into their directories. Here are instructions for common software procedures. Make Cmake Apptainer : create containers and port Docker containers to the clusters (formerly know as \"Singularity\") If you run into issues with your software installations, contact us .","title":"Compile Your Own Software"},{"location":"clusters-at-yale/applications/#software-guides","text":"We provide additional guides for running specific software on the clusters as well.","title":"Software Guides"},{"location":"clusters-at-yale/applications/compile/","text":"Build Software How to get software you need up and running on the clusters. caveat emptor We recommend either use existing software modules , Conda , Apptainer , or pre-compiled software where available. However, there are cases where compiling applications is necessary or desired. This can be because the pre-compiled version isn't readily available/compatible or because compiling applications on the cluster will make an appreciable difference in performance. It is also the case that many R packages are compiled at install time. When compiling applications on the clusters, it is important to consider the ways in which you expect to run the application you are endeavoring to get working. If you want to be able to run jobs calling your application any node on the cluster, you will need to target the oldest hardware available so that newer optimizations are not used that will fail on some nodes. If your application is already quite specialized (e.g. needs GPUs or brand-new CPU instructions), you will want to compile it natively for the subset of compute nodes on which your jobs will run. This decision is often a trade-off between faster individual jobs or jobs that can run on more nodes at once. Each of the cluster pages (see the HPC Resources page for a list) has a \"Compute Node Configurations\" section where nodes are roughly listed from oldest to newest. Illegal Instruction Instructions You may find that software compiled on newer compute nodes will fail with the error Illegal instruction (core dumped) . This includes R/Python libraries that include code that compiles from source. To remedy this issue make sure to always either: Build or install software on the oldest available nodes. You can ensure you are on the oldest hardware by specifying the oldest feature ( --constraint oldest ) in your job submission. Require that your jobs running the software in question request similar hardware to their build environment. If your software needs newer instructions using avx512 as a constraint will probably work, but limit the pool of nodes your job can run on. Either way, you will want to control where your jobs run with job constraints . Warning Grace's login nodes have newer architecture than the oldest nodes on the cluster. Always compile in an interactive job submitted with the --constraint oldest Slurm flag if you want to ensure your program will run on all generations of the compute nodes. Conventions Local Install Because you don't have admin/root/sudo privileges on the clusters, you won't be able to use sudo and a package manager like apt , yum , etc.; You will need to adapt install instructions to allow for what is called a local or user install. If you prefer or require this method, you should create a container image (see our Apptainer guide ), then run it on the cluster. For things to work smoothly you will need to choose and stick with a prefix, or path to your installed applications and libraries. We recommend this be either in your home or project directory, something like ~/software or /path/to/project/software . Make sure you have created it before continuing. Tip If you choose a project directory prefix, it will be easier to share your applications with lab mates or other cluster users. Just make sure to use the true path (the one returned by mydirectories ). Once you've chosen a prefix you will want to add any directory with executables you want to run to your PATH environment variable, and any directores with libraries that your application(s) link to your LD_LIBRARY_PATH environment variable. Each of these tell your shell where to look when you call your application without specifying an absolute path to it. To set these variables permanently, add the following to the end of your ~/.bashrc file: # local installs export MY_PREFIX = ~/software export PATH = $MY_PREFIX /bin: $PATH export LD_LIBRARY_PATH = $MY_PREFIX /lib: $LD_LIBRARY_PATH For the remainder of the guide we'll use the $MY_PREFIX variable to refer to the prefix. See below or your application's install instructions for exactly how to specify your prefix at build/install time. Dependencies You will need to develop a build strategy that works for you and stay consistent. If you're happy using libraries and toolchains that are already available on the cluster as dependencies (recommended), feel free to create module collections that serve as your environments. If you prefer to completely build your own software tree, that is ok too. Whichever route you choose, try to stick with the same version of dependencies (e.g. MPI, zlib, numpy) and compiler you're using (e.g. GCC, intel). We find that unless absolutely necessary, the newest version of a compiler or library might not be the most compatible with a wide array of scientific software so you may want to step back a few versions or try using what was available at the time your application was being developed. Autotools ( configure / make ) If your application includes instructions to run ./bootstrap , ./autogen.sh , ./configure or make , it is using the GNU Build System . Warning If you are using GCC 10+, you will need to load a separate Autotools module for your version of GCC; e.g., module load Autotools/20200321-GCCcore-10.2.0 configure If you are instructed to run ./configure to generate a Makefile, specify your prefix with the --prefix option. This creates a file, usually named Makefile that is a recipe for make to use to build your application. export MY_PREFIX = ~/software ./configure --prefix = $MY_PREFIX make install If your configure ran properly, make install should properly place your application in your prefix directory. If there is no install target specified for your application, you can either run make and copy the application to your $MY_PREFIX/bin directory or build it somewhere in $MY_PREFIX and add its relevant paths to your PATH and/or LD_LIBRARY_PATH environment variables in your ~/.bashrc file as shown in the local install section. CMake CMake is a popular cross-platform build system. On a linux system, CMake will create a Makefile in a step analogous to ./configure . It is common to create a build directory then run the cmake and make commands from there. Below is what installing to your $MY_DIRECTORY prefix might look like with CMake. CMake instructions also tend to link together the build process onto on line with && , which tells your shell to only continue to the next command if the previous one exited without error. export MY_PREFIX = ~/software mkdir build && cd build && cmake -DCMAKE_INSTALL_PREFIX = $MY_PREFIX .. && make && make install","title":"Build Software"},{"location":"clusters-at-yale/applications/compile/#build-software","text":"How to get software you need up and running on the clusters.","title":"Build Software"},{"location":"clusters-at-yale/applications/compile/#caveat-emptor","text":"We recommend either use existing software modules , Conda , Apptainer , or pre-compiled software where available. However, there are cases where compiling applications is necessary or desired. This can be because the pre-compiled version isn't readily available/compatible or because compiling applications on the cluster will make an appreciable difference in performance. It is also the case that many R packages are compiled at install time. When compiling applications on the clusters, it is important to consider the ways in which you expect to run the application you are endeavoring to get working. If you want to be able to run jobs calling your application any node on the cluster, you will need to target the oldest hardware available so that newer optimizations are not used that will fail on some nodes. If your application is already quite specialized (e.g. needs GPUs or brand-new CPU instructions), you will want to compile it natively for the subset of compute nodes on which your jobs will run. This decision is often a trade-off between faster individual jobs or jobs that can run on more nodes at once. Each of the cluster pages (see the HPC Resources page for a list) has a \"Compute Node Configurations\" section where nodes are roughly listed from oldest to newest.","title":"caveat emptor"},{"location":"clusters-at-yale/applications/compile/#illegal-instruction-instructions","text":"You may find that software compiled on newer compute nodes will fail with the error Illegal instruction (core dumped) . This includes R/Python libraries that include code that compiles from source. To remedy this issue make sure to always either: Build or install software on the oldest available nodes. You can ensure you are on the oldest hardware by specifying the oldest feature ( --constraint oldest ) in your job submission. Require that your jobs running the software in question request similar hardware to their build environment. If your software needs newer instructions using avx512 as a constraint will probably work, but limit the pool of nodes your job can run on. Either way, you will want to control where your jobs run with job constraints . Warning Grace's login nodes have newer architecture than the oldest nodes on the cluster. Always compile in an interactive job submitted with the --constraint oldest Slurm flag if you want to ensure your program will run on all generations of the compute nodes.","title":"Illegal Instruction Instructions"},{"location":"clusters-at-yale/applications/compile/#conventions","text":"","title":"Conventions"},{"location":"clusters-at-yale/applications/compile/#local-install","text":"Because you don't have admin/root/sudo privileges on the clusters, you won't be able to use sudo and a package manager like apt , yum , etc.; You will need to adapt install instructions to allow for what is called a local or user install. If you prefer or require this method, you should create a container image (see our Apptainer guide ), then run it on the cluster. For things to work smoothly you will need to choose and stick with a prefix, or path to your installed applications and libraries. We recommend this be either in your home or project directory, something like ~/software or /path/to/project/software . Make sure you have created it before continuing. Tip If you choose a project directory prefix, it will be easier to share your applications with lab mates or other cluster users. Just make sure to use the true path (the one returned by mydirectories ). Once you've chosen a prefix you will want to add any directory with executables you want to run to your PATH environment variable, and any directores with libraries that your application(s) link to your LD_LIBRARY_PATH environment variable. Each of these tell your shell where to look when you call your application without specifying an absolute path to it. To set these variables permanently, add the following to the end of your ~/.bashrc file: # local installs export MY_PREFIX = ~/software export PATH = $MY_PREFIX /bin: $PATH export LD_LIBRARY_PATH = $MY_PREFIX /lib: $LD_LIBRARY_PATH For the remainder of the guide we'll use the $MY_PREFIX variable to refer to the prefix. See below or your application's install instructions for exactly how to specify your prefix at build/install time.","title":"Local Install"},{"location":"clusters-at-yale/applications/compile/#dependencies","text":"You will need to develop a build strategy that works for you and stay consistent. If you're happy using libraries and toolchains that are already available on the cluster as dependencies (recommended), feel free to create module collections that serve as your environments. If you prefer to completely build your own software tree, that is ok too. Whichever route you choose, try to stick with the same version of dependencies (e.g. MPI, zlib, numpy) and compiler you're using (e.g. GCC, intel). We find that unless absolutely necessary, the newest version of a compiler or library might not be the most compatible with a wide array of scientific software so you may want to step back a few versions or try using what was available at the time your application was being developed.","title":"Dependencies"},{"location":"clusters-at-yale/applications/compile/#autotools-configuremake","text":"If your application includes instructions to run ./bootstrap , ./autogen.sh , ./configure or make , it is using the GNU Build System . Warning If you are using GCC 10+, you will need to load a separate Autotools module for your version of GCC; e.g., module load Autotools/20200321-GCCcore-10.2.0","title":"Autotools (configure/make)"},{"location":"clusters-at-yale/applications/compile/#configure","text":"If you are instructed to run ./configure to generate a Makefile, specify your prefix with the --prefix option. This creates a file, usually named Makefile that is a recipe for make to use to build your application. export MY_PREFIX = ~/software ./configure --prefix = $MY_PREFIX","title":"configure"},{"location":"clusters-at-yale/applications/compile/#make-install","text":"If your configure ran properly, make install should properly place your application in your prefix directory. If there is no install target specified for your application, you can either run make and copy the application to your $MY_PREFIX/bin directory or build it somewhere in $MY_PREFIX and add its relevant paths to your PATH and/or LD_LIBRARY_PATH environment variables in your ~/.bashrc file as shown in the local install section.","title":"make install"},{"location":"clusters-at-yale/applications/compile/#cmake","text":"CMake is a popular cross-platform build system. On a linux system, CMake will create a Makefile in a step analogous to ./configure . It is common to create a build directory then run the cmake and make commands from there. Below is what installing to your $MY_DIRECTORY prefix might look like with CMake. CMake instructions also tend to link together the build process onto on line with && , which tells your shell to only continue to the next command if the previous one exited without error. export MY_PREFIX = ~/software mkdir build && cd build && cmake -DCMAKE_INSTALL_PREFIX = $MY_PREFIX .. && make && make install","title":"CMake"},{"location":"clusters-at-yale/applications/lifecycle/","text":"Software Module Lifecycle To keep the YCRC cluster software modules catalogs tidy, relevant, and up to date, we periodically deprecate and introduce modules. Deprecated Modules The two major criteria we use to decide which modules to deprecate are: A software module has not been used much in the past year We are ending support for the toolchain with which a module was built As we deprecate modules, every time you load a module that has been marked for removal a warning message will appear. The message state when the module will no appear in the module list. If you see such a message, we recommend you update your project to use a supported module as soon as possible or contacting us for help. Toolchain Support The YCRC maintains a rolling two toolchain version support model. At any given time on a cluster, we aim to support two versions of each of the major toolchains, foss and intel . The two versions are separated by two years and new software is typically installed with the later version. When we introduce a new toolchain version, we phase out support for the oldest by marking software in that toolchain for deprecation. A few months later, software in the oldest toolchain version will be removed from the module list and no longer supported by the YCRC.","title":"Module Lifecycle"},{"location":"clusters-at-yale/applications/lifecycle/#software-module-lifecycle","text":"To keep the YCRC cluster software modules catalogs tidy, relevant, and up to date, we periodically deprecate and introduce modules.","title":"Software Module Lifecycle"},{"location":"clusters-at-yale/applications/lifecycle/#deprecated-modules","text":"The two major criteria we use to decide which modules to deprecate are: A software module has not been used much in the past year We are ending support for the toolchain with which a module was built As we deprecate modules, every time you load a module that has been marked for removal a warning message will appear. The message state when the module will no appear in the module list. If you see such a message, we recommend you update your project to use a supported module as soon as possible or contacting us for help.","title":"Deprecated Modules"},{"location":"clusters-at-yale/applications/lifecycle/#toolchain-support","text":"The YCRC maintains a rolling two toolchain version support model. At any given time on a cluster, we aim to support two versions of each of the major toolchains, foss and intel . The two versions are separated by two years and new software is typically installed with the later version. When we introduce a new toolchain version, we phase out support for the oldest by marking software in that toolchain for deprecation. A few months later, software in the oldest toolchain version will be removed from the module list and no longer supported by the YCRC.","title":"Toolchain Support"},{"location":"clusters-at-yale/applications/modules/","text":"Load Software with Modules To facilitate the diverse work that happens on the YCRC clusters we compile, install, and manage software packages separately from those installed in standard system directories. We use EasyBuild to build, install, and manage packages. You can access these packages as Lmod modules. The modules involving compiled software are arranged into hierarchical toolchains that make dependencies more consistent when you load multiple modules. Warning Avoid loading Python or R modules simultaneously with conda environments. This will almost always break something. Find Modules All Available Modules To list all available modules, run: module avail Search For Modules You can search for modules or extensions with spider and avail . For example, to find and list all Python version 3 modules, run: module avail python/3 To find any module or extension that mentions python in its name or description, use the command: module spider python Get Module Help You can get a brief description of a module and the url to the software's homepage by running: module help modulename/version If you don't find a commonly used software package you require, contact us with a software installation request. Otherwise, check out our installation guides to install it for yourself. Load and Unload Modules Load The module load command modifies your environment so you can use the specified software package(s). This command is case-sensitive to module names. The module load command will load dependencies as needed, you don't need to load them separately. For batch jobs , add module load command(s) to your submission script. For example, to load Python version 3.8.6 and BLAST+ version 2.11.0 , find modules with matching toolchain suffixes and run the command: module load Python/3.8.6-GCCcore-10.2.0 BLAST+/2.11.0-GCCcore-10.2.0 Lmod will add python and the BLAST commands to your environment. Since both of these modules were built with the GCCcore/10.2.0 toolchain module, they will not load conflicting libraries. Recall you can see the other modules that were loaded by running module list . Module Defaults As new versions of software get installed and others are deprecated , the default module version can change over time. It is best practice to note the specific module versions you are using for a project and load those explicitly, e.g. module load Python/3.8.6-GCCcore-10.2.0 not module load Python . This makes your work more reproducible and less likely to change unexpectedly in the future. Unload You can also unload a specific module that you've previously loaded: module unload R Or unload all modules at once with: module purge Purge Lightly module purge will alert you to a sticky module that is always loaded called StdEnv . Avoid unloading StdEnv unless explicitly told to do so, othewise you will lose some important setup for the cluster you are on. Module Collections Save Collections It can be a pain to enter a long list of modules every time you return to a project. Module collections allow you to create sets of modules to load together. This method is particularly useful if you have two or more module sets that may conflict with one another. Save a collection of modules by first loading all the modules you want to save together then run: module save environment_name (replace environment_name with something more meaningful to you) Restore Collections Load a collection with module restore : module restore environment_name To modify a collection: restore it, make the desired changes by load ing and/or unload ing modules, then save it to the same name. List Collections To get a list of your collections, run: module savelist ml : A Convinient Tool Lmod provides a convinient tool called ml to simplify all of the module commands. List Module Loaded ml Load Modules ml Python/3.8.6-GCCcore-10.2.0 Unload Modules ml -Python With moudle Sub-commands ml can be used to replace the module command. It can take all the sub-commands from module and works the same way as module does. ml load Python R ml unload Python ml spider Python ml avail ml whatis Python ml key Python ml purge ml save test ml restore test Environment Variables To refer to the directory where the software from a module is stored, you can use the environment variable $EBROOTMODULENAME where MODULENAME is the name of the module in all caps with no spaces. This can be useful for finding the executables, libraries, or readme files that are included with the software: [ netid@node ~ ] $ module load SAMtools [ netid@node ~ ] $ echo $EBVERSIONSAMTOOLS 1 .11 [ netid@node ~ ] $ ls $EBROOTSAMTOOLS bin easybuild include lib lib64 share [ netid@node ~ ] $ ls $EBROOTSAMTOOLS /bin ace2sam maq2sam-short psl2sam.pl soap2sam.pl blast2sam.pl md5fa r2plot.lua vcfutils.lua bowtie2sam.pl md5sum-lite sam2vcf.pl wgsim export2sam.pl novo2sam.pl samtools wgsim_eval.pl interpolate_sam.pl plot-ampliconstats samtools.pl zoom2sam.pl maq2sam-long plot-bamstats seq_cache_populate.pl Further Reading You can view documentation while on the cluster using the command: man module There is even more information at the offical Lmod website and related documentation .","title":"Software Modules"},{"location":"clusters-at-yale/applications/modules/#load-software-with-modules","text":"To facilitate the diverse work that happens on the YCRC clusters we compile, install, and manage software packages separately from those installed in standard system directories. We use EasyBuild to build, install, and manage packages. You can access these packages as Lmod modules. The modules involving compiled software are arranged into hierarchical toolchains that make dependencies more consistent when you load multiple modules. Warning Avoid loading Python or R modules simultaneously with conda environments. This will almost always break something.","title":"Load Software with Modules"},{"location":"clusters-at-yale/applications/modules/#find-modules","text":"","title":"Find Modules"},{"location":"clusters-at-yale/applications/modules/#all-available-modules","text":"To list all available modules, run: module avail","title":"All Available Modules"},{"location":"clusters-at-yale/applications/modules/#search-for-modules","text":"You can search for modules or extensions with spider and avail . For example, to find and list all Python version 3 modules, run: module avail python/3 To find any module or extension that mentions python in its name or description, use the command: module spider python","title":"Search For Modules"},{"location":"clusters-at-yale/applications/modules/#get-module-help","text":"You can get a brief description of a module and the url to the software's homepage by running: module help modulename/version If you don't find a commonly used software package you require, contact us with a software installation request. Otherwise, check out our installation guides to install it for yourself.","title":"Get Module Help"},{"location":"clusters-at-yale/applications/modules/#load-and-unload-modules","text":"","title":"Load and Unload Modules"},{"location":"clusters-at-yale/applications/modules/#load","text":"The module load command modifies your environment so you can use the specified software package(s). This command is case-sensitive to module names. The module load command will load dependencies as needed, you don't need to load them separately. For batch jobs , add module load command(s) to your submission script. For example, to load Python version 3.8.6 and BLAST+ version 2.11.0 , find modules with matching toolchain suffixes and run the command: module load Python/3.8.6-GCCcore-10.2.0 BLAST+/2.11.0-GCCcore-10.2.0 Lmod will add python and the BLAST commands to your environment. Since both of these modules were built with the GCCcore/10.2.0 toolchain module, they will not load conflicting libraries. Recall you can see the other modules that were loaded by running module list . Module Defaults As new versions of software get installed and others are deprecated , the default module version can change over time. It is best practice to note the specific module versions you are using for a project and load those explicitly, e.g. module load Python/3.8.6-GCCcore-10.2.0 not module load Python . This makes your work more reproducible and less likely to change unexpectedly in the future.","title":"Load"},{"location":"clusters-at-yale/applications/modules/#unload","text":"You can also unload a specific module that you've previously loaded: module unload R Or unload all modules at once with: module purge Purge Lightly module purge will alert you to a sticky module that is always loaded called StdEnv . Avoid unloading StdEnv unless explicitly told to do so, othewise you will lose some important setup for the cluster you are on.","title":"Unload"},{"location":"clusters-at-yale/applications/modules/#module-collections","text":"","title":"Module Collections"},{"location":"clusters-at-yale/applications/modules/#save-collections","text":"It can be a pain to enter a long list of modules every time you return to a project. Module collections allow you to create sets of modules to load together. This method is particularly useful if you have two or more module sets that may conflict with one another. Save a collection of modules by first loading all the modules you want to save together then run: module save environment_name (replace environment_name with something more meaningful to you)","title":"Save Collections"},{"location":"clusters-at-yale/applications/modules/#restore-collections","text":"Load a collection with module restore : module restore environment_name To modify a collection: restore it, make the desired changes by load ing and/or unload ing modules, then save it to the same name.","title":"Restore Collections"},{"location":"clusters-at-yale/applications/modules/#list-collections","text":"To get a list of your collections, run: module savelist","title":"List Collections"},{"location":"clusters-at-yale/applications/modules/#ml-a-convinient-tool","text":"Lmod provides a convinient tool called ml to simplify all of the module commands.","title":"ml: A Convinient Tool"},{"location":"clusters-at-yale/applications/modules/#list-module-loaded","text":"ml","title":"List Module Loaded"},{"location":"clusters-at-yale/applications/modules/#load-modules","text":"ml Python/3.8.6-GCCcore-10.2.0","title":"Load Modules"},{"location":"clusters-at-yale/applications/modules/#unload-modules","text":"ml -Python","title":"Unload Modules"},{"location":"clusters-at-yale/applications/modules/#with-moudle-sub-commands","text":"ml can be used to replace the module command. It can take all the sub-commands from module and works the same way as module does. ml load Python R ml unload Python ml spider Python ml avail ml whatis Python ml key Python ml purge ml save test ml restore test","title":"With moudle Sub-commands"},{"location":"clusters-at-yale/applications/modules/#environment-variables","text":"To refer to the directory where the software from a module is stored, you can use the environment variable $EBROOTMODULENAME where MODULENAME is the name of the module in all caps with no spaces. This can be useful for finding the executables, libraries, or readme files that are included with the software: [ netid@node ~ ] $ module load SAMtools [ netid@node ~ ] $ echo $EBVERSIONSAMTOOLS 1 .11 [ netid@node ~ ] $ ls $EBROOTSAMTOOLS bin easybuild include lib lib64 share [ netid@node ~ ] $ ls $EBROOTSAMTOOLS /bin ace2sam maq2sam-short psl2sam.pl soap2sam.pl blast2sam.pl md5fa r2plot.lua vcfutils.lua bowtie2sam.pl md5sum-lite sam2vcf.pl wgsim export2sam.pl novo2sam.pl samtools wgsim_eval.pl interpolate_sam.pl plot-ampliconstats samtools.pl zoom2sam.pl maq2sam-long plot-bamstats seq_cache_populate.pl","title":"Environment Variables"},{"location":"clusters-at-yale/applications/modules/#further-reading","text":"You can view documentation while on the cluster using the command: man module There is even more information at the offical Lmod website and related documentation .","title":"Further Reading"},{"location":"clusters-at-yale/applications/toolchains/","text":"Software Module Toolchains The YCRC uses a framework called EasyBuild to build and install the software you access via the module system . Toolchains When we install software, we use pre-defined build environment modules called toolchains. These are modules that include dependencies like compilers and libraries such as GCC, OpenMPI, CUDA, etc. We do this to keep our build process simpler, and to ensure that sets of software modules loaded together function properly. The two groups of toolchains we use on the YCRC clusters are foss and intel , which hierarchically include some shared sub-toolchains. Toolchains will have versions associated with the version of the compiler and/or when the toolchain was composed. Toolchain names and versions are appended as suffixes in module names. This tells you that a module was built with that toolchain and which other modules are compatible with it. The YCRC maintains a rolling two toolchain version support model. The toolchain versions supported on each cluster are listed in the Module Lifecycle documentation. Free Open Source Software ( foss ) The foss toolchains are versioned with a yearletter scheme, e.g. foss/2020b is the second foss toolchain composed in 2020. Software modules that were built with a sub-toolchain, e.g. GCCcore , are still safe to load with their parents as long as their versions match. The major difference between foss and fosscuda is that fosscuda includes CUDA and builds applications for GPUs by default. You shoould only use fosscuda modules on nodes with GPUs . Below is a tree depicting which toolchains inherit each other. foss: gompi + FFTW, OpenBLAS, ScaLAPACK \u2514\u2500\u2500 gompi: GCC + OpenMPI \u2514\u2500\u2500 GCC: GCCcore + zlib, binutils \u2514\u2500\u2500 GCCcore: GNU Compiler Collection fosscuda: gompic + FFTW, OpenBLAS, ScaLAPACK \u2514\u2500\u2500 gompic: gcccuda + CUDA-enabled OpenMPI \u2514\u2500\u2500 gcccuda: GCC + CUDA \u2514\u2500\u2500 GCC: GCCcore + zlib, binutils \u2514\u2500\u2500 GCCcore: GNU Compiler Collection Intel The YCRC licenses Intel Parallel Studio XE (Intel oneAPI Base & HPC Toolkit coming soon). The intel and iomkl toolchains are versioned with a yearletter scheme, e.g. intel/2020b is the second intel toolchain composed in 2020. The major difference between iomkl and intel is MPI - intel uses Intel's MPI implementation and iomkl uses OpenMPI. Below is a tree depicting which toolchains inherit each other. iomkl: iompi + Intel Math Kernel Library \u2514\u2500\u2500 iompi: iccifort + OpenMPI \u2514\u2500\u2500 iccifort: Intel compilers \u2514\u2500\u2500 GCCcore: GNU Compiler Collection intel: iimpi + Intel Math Kernel Library \u2514\u2500\u2500 iimpi: iccifort + Intel MPI \u2514\u2500\u2500 iccifort: Intel C/C++/Fortran compilers \u2514\u2500\u2500 GCCcore: GNU Compiler Collection What Versions Match? To see what versions of sub-toolchains are compatible with their parents, load a foss or intel module of interest and run module list . [ netid@node ~ ] $ module load foss/2020b [ netid@node ~ ] $ module list Currently Loaded Modules: 1 ) StdEnv ( S ) 7 ) XZ/5.2.5-GCCcore-10.2.0 13 ) OpenMPI/4.0.5-GCC-10.2.0 2 ) GCCcore/10.2.0 8 ) libxml2/2.9.10-GCCcore-10.2.0 14 ) OpenBLAS/0.3.12-GCC-10.2.0 3 ) zlib/1.2.11-GCCcore-10.2.0 9 ) libpciaccess/0.16-GCCcore-10.2.0 15 ) gompi/2020b 4 ) binutils/2.35-GCCcore-10.2.0 10 ) hwloc/2.2.0-GCCcore-10.2.0 16 ) FFTW/3.3.8-gompi-2020b 5 ) GCC/10.2.0 11 ) UCX/1.9.0-GCCcore-10.2.0 17 ) ScaLAPACK/2.1.0-gompi-2020b 6 ) numactl/2.0.13-GCCcore-10.2.0 12 ) libfabric/1.11.0-GCCcore-10.2.0 18 ) foss/2020b Where: S: Module is Sticky, requires --force to unload or purge Here you see that foss/2020b includes GCCcore/10.2.0 , so modules with either the foss-2020b or GCCcore-10.2.0 should be compatible.","title":"Module Toolchains"},{"location":"clusters-at-yale/applications/toolchains/#software-module-toolchains","text":"The YCRC uses a framework called EasyBuild to build and install the software you access via the module system .","title":"Software Module Toolchains"},{"location":"clusters-at-yale/applications/toolchains/#toolchains","text":"When we install software, we use pre-defined build environment modules called toolchains. These are modules that include dependencies like compilers and libraries such as GCC, OpenMPI, CUDA, etc. We do this to keep our build process simpler, and to ensure that sets of software modules loaded together function properly. The two groups of toolchains we use on the YCRC clusters are foss and intel , which hierarchically include some shared sub-toolchains. Toolchains will have versions associated with the version of the compiler and/or when the toolchain was composed. Toolchain names and versions are appended as suffixes in module names. This tells you that a module was built with that toolchain and which other modules are compatible with it. The YCRC maintains a rolling two toolchain version support model. The toolchain versions supported on each cluster are listed in the Module Lifecycle documentation.","title":"Toolchains"},{"location":"clusters-at-yale/applications/toolchains/#free-open-source-software-foss","text":"The foss toolchains are versioned with a yearletter scheme, e.g. foss/2020b is the second foss toolchain composed in 2020. Software modules that were built with a sub-toolchain, e.g. GCCcore , are still safe to load with their parents as long as their versions match. The major difference between foss and fosscuda is that fosscuda includes CUDA and builds applications for GPUs by default. You shoould only use fosscuda modules on nodes with GPUs . Below is a tree depicting which toolchains inherit each other. foss: gompi + FFTW, OpenBLAS, ScaLAPACK \u2514\u2500\u2500 gompi: GCC + OpenMPI \u2514\u2500\u2500 GCC: GCCcore + zlib, binutils \u2514\u2500\u2500 GCCcore: GNU Compiler Collection fosscuda: gompic + FFTW, OpenBLAS, ScaLAPACK \u2514\u2500\u2500 gompic: gcccuda + CUDA-enabled OpenMPI \u2514\u2500\u2500 gcccuda: GCC + CUDA \u2514\u2500\u2500 GCC: GCCcore + zlib, binutils \u2514\u2500\u2500 GCCcore: GNU Compiler Collection","title":"Free Open Source Software (foss)"},{"location":"clusters-at-yale/applications/toolchains/#intel","text":"The YCRC licenses Intel Parallel Studio XE (Intel oneAPI Base & HPC Toolkit coming soon). The intel and iomkl toolchains are versioned with a yearletter scheme, e.g. intel/2020b is the second intel toolchain composed in 2020. The major difference between iomkl and intel is MPI - intel uses Intel's MPI implementation and iomkl uses OpenMPI. Below is a tree depicting which toolchains inherit each other. iomkl: iompi + Intel Math Kernel Library \u2514\u2500\u2500 iompi: iccifort + OpenMPI \u2514\u2500\u2500 iccifort: Intel compilers \u2514\u2500\u2500 GCCcore: GNU Compiler Collection intel: iimpi + Intel Math Kernel Library \u2514\u2500\u2500 iimpi: iccifort + Intel MPI \u2514\u2500\u2500 iccifort: Intel C/C++/Fortran compilers \u2514\u2500\u2500 GCCcore: GNU Compiler Collection","title":"Intel"},{"location":"clusters-at-yale/applications/toolchains/#what-versions-match","text":"To see what versions of sub-toolchains are compatible with their parents, load a foss or intel module of interest and run module list . [ netid@node ~ ] $ module load foss/2020b [ netid@node ~ ] $ module list Currently Loaded Modules: 1 ) StdEnv ( S ) 7 ) XZ/5.2.5-GCCcore-10.2.0 13 ) OpenMPI/4.0.5-GCC-10.2.0 2 ) GCCcore/10.2.0 8 ) libxml2/2.9.10-GCCcore-10.2.0 14 ) OpenBLAS/0.3.12-GCC-10.2.0 3 ) zlib/1.2.11-GCCcore-10.2.0 9 ) libpciaccess/0.16-GCCcore-10.2.0 15 ) gompi/2020b 4 ) binutils/2.35-GCCcore-10.2.0 10 ) hwloc/2.2.0-GCCcore-10.2.0 16 ) FFTW/3.3.8-gompi-2020b 5 ) GCC/10.2.0 11 ) UCX/1.9.0-GCCcore-10.2.0 17 ) ScaLAPACK/2.1.0-gompi-2020b 6 ) numactl/2.0.13-GCCcore-10.2.0 12 ) libfabric/1.11.0-GCCcore-10.2.0 18 ) foss/2020b Where: S: Module is Sticky, requires --force to unload or purge Here you see that foss/2020b includes GCCcore/10.2.0 , so modules with either the foss-2020b or GCCcore-10.2.0 should be compatible.","title":"What Versions Match?"},{"location":"clusters-at-yale/guides/","text":"Guides to Software & Tools The YCRC installs and manage commonly used software. These software are available as modules, which allow you to add or remove different combinations and versions of software to your environment as needed. See our software module guide for more information. To see all pre-installed software, you can run module avail on a cluster to page through all available software. For certain software packages, we provide guides for running on our clusters. If you have tips for running a commonly used software and would like to contribute them to our Software Guides, contact us or submit a pull request on the docs repo . Additional Guides For additional guides and tutorials, see our catalog of recommended online tutorials on Python, R, unix commands and more .","title":"Overview"},{"location":"clusters-at-yale/guides/#guides-to-software-tools","text":"The YCRC installs and manage commonly used software. These software are available as modules, which allow you to add or remove different combinations and versions of software to your environment as needed. See our software module guide for more information. To see all pre-installed software, you can run module avail on a cluster to page through all available software. For certain software packages, we provide guides for running on our clusters. If you have tips for running a commonly used software and would like to contribute them to our Software Guides, contact us or submit a pull request on the docs repo .","title":"Guides to Software & Tools"},{"location":"clusters-at-yale/guides/#additional-guides","text":"For additional guides and tutorials, see our catalog of recommended online tutorials on Python, R, unix commands and more .","title":"Additional Guides"},{"location":"clusters-at-yale/guides/cesm/","text":"CESM/CAM This is a quick start guide for CESM at Yale. You will still need to read the CESM User Guide and work with your fellow research group members to design and run your simulations, but this guide covers the basics that are specific to running CESM at Yale. CESM User Guides CESM1.0.4 User\u2019s Guide CESM1.1.z User\u2019s Guide CESM User\u2019s Guide (CESM1.2 Release Series User\u2019s Guide) (PDF) Modules CESM 1.0.4, 1.2.2, 2.x are available on Grace. For CESM 2.1.0, load the following modules module load CESM/2.1.0-iomkl-2018a For older versions of CESM, you will need to use the old modules. These old version of CESM do not work with the new modules module use /vast/palmer/apps/old.grace/Modules module avail CESM Once you have located your module, run module load with the module name from above. With either module, the module will configure your environment with the Intel compiler, OpenMPI and NetCDF libraries as well as set the location of the Yale\u2019s repository of CESM input data. If you will be primarily using CESM, you can avoid rerunning the module load command every time you login by saving it to your default environment: module load module save Input Data To reduce the amount of data duplication on the cluster, we keep one centralized repository of CESM input data. The YCRC staff are only people who can add to that directory. If your build fails due to missing inputdata, contact us with your create_newcase line and we will download that data for you. Run CESM CESM needs to be rebuilt separately for each run. As a result, running CESM is more complicated than a standard piece of software where you would just run the executable. Create Your Case Each simulation is called a \u201ccase\u201d. Loading a CESM module will put the create_newcase script in your path, so you can call it as follows. This will create a directory with your case name, that we will refer to as $CASE through out the rest of the guide. create_newcase -case $CASE -compset = -res = -mach = cd $CASE The mach parameters for Grace is yalegrace for CESM 1.0.4 and gracempi for CESM 1.2.2 and CESM 2.x , respectively. For example create_newcase --case $CASE --compset = B1850 --res = f09_g17 --mach = gracempi cd $CASE Setup Your Case If you are making any changes to the namelist files (such as increasing the duration of the simulation), do those before running the setup scripts below. CESM 1.0.X ./configure -case CESM 1.1.X and CESM 1.2.X ./cesm_setup CESM 2.X ./case.setup Build Your Case After you run the setup script, there will be a set of the scripts in your case directory that start with your case name. To compile your simulation executable, first move to an interactive job and then run the build script corresponding to your case. # CESM 1.x salloc -c 4 module load # = the appropriate module for your CESM version ./ $CASE . $mach .build # CESM 2.x salloc -c 4 module load # = the appropriate module for your CESM version ./case.build --skip-provenance-check Note the --skip-provenance-check flag is required with CESM 2.x due to the changes made to port the code to Grace. For more details on interactive jobs, see our Slurm documentation . During the build, CESM will create a corresponding directory in your scratch60 or project directory at ls ~/scratch60/CESM/$CASE This directory will contain all the outputs from your simulation as well as logs and the cesm.exe executable. Common Build Issues Make sure you compile on an interactive node as described above. If you build fails, it will direct you to look in a bldlog file. If that log complains that it can\u2019t find mpirun, NetCDF or another library or executable, make sure you have the correct CESM module loaded. It can helpful to run module purge before the module load to ensure a reproducible environment. If you get an error saying ERROR: Error gathering provenance information from manage_externals , rerun the build using the suggested flag, e.g. ./case.build --skip-provenance-check . Submit Your Case Once the build is complete, which can take 5-15 minutes, you can submit your case with the submit script. # CESM 1.x ./ $CASE . $mach .submit # CESM 2.x ./case.submit For more details on monitoring your submitted jobs, see our Slurm documentation . Changing Slurm Partition In CESM 2.x, to change the partition in which your main jobs will run, use the following command: ./xmlchange JOB_QUEUE = scavenge --subgroup case .run The associated archive job will still be submitted to the day partition. Troubleshoot Your Run If your run doesn\u2019t complete, there are a few places to look to identify the error. CESM writes to multiple log files for the different components and you will likely have to look in a few to find the root cause of your error. Slurm Log In your case directory, there will be a file that looks like slurm-.log . Check that file first to make sure the job started up properly. If the last few lines in the file redirect you to look at cpl.log. file in your scratch directory, see below. If there is another error, try to address it and resubmit. CESM Run Logs If the last few lines of the slurm log direct you to look at cpl.log. file, change directory to your case \u201crun\u201d directory (usually in your project directory): cd ~/project/CESM/ $CASE /run The pointer to the cpl file is often misleading as I have found the error is usually located in one of the other logs. Instead look in the cesm.log.xxxxxx file. Towards the end there may be an error or it may signify which component was running. Then look in the log corresponding to that component to track down the issue. One shortcut to finding the relevant logs is to sort the log files by the time to see which ones were last updated: ls -ltr *log* Look at the end of the last couple logs listed and look for an indication of the error. Resolve Errors Once you have identified the lines in the logs corresponding to your error: If your log says something like Disk quota exceeded , your group is out of space in the fileset you are writing to. You can run the getquota script to get details on your disk usage. Your group will need to reduce their usage before you will be able to run successfully. If it looks like a model error and you don\u2019t know how to fix it, we strongly recommend Googling your error and/or looking in the CESM forums . If you are still experiencing issues, contact us . Alternative Submission Parameters By default, the submission script will submit to the \"mpi\" partition for 1 day. CESM 1.x To change this in CESM 1.x, edit your case\u2019s run script and change the partition and time. The maximum walltime in the mpi and scavenge partitions is 24 hours. For example: ## scavenge partition #SBATCH --partition=scavenge #SBATCH --time=1- CESM 2.x To change this in CESM 2.x, use ./xmlchange in your run directory. # Change partition to scavenge ./xmlchange JOB_QUEUE=scavenge # Change walltime limit to 2 days (> 24 hours is only available on PI partitions) ./xmlchange JOB_WALLCLOCK_TIME 2-00:00:00 Further Reading We recommend referencing the User Guides listed at the top of this page. CESM User Forum Our Slurm Documentation CESM is a very widely used package, you can often find answers by simply using Google. Just make sure that the solutions you find correspond to the approximate version of CESM you are using. CESM changes in subtle but significant ways between versions.","title":"CESM/CAM"},{"location":"clusters-at-yale/guides/cesm/#cesmcam","text":"This is a quick start guide for CESM at Yale. You will still need to read the CESM User Guide and work with your fellow research group members to design and run your simulations, but this guide covers the basics that are specific to running CESM at Yale.","title":"CESM/CAM"},{"location":"clusters-at-yale/guides/cesm/#cesm-user-guides","text":"CESM1.0.4 User\u2019s Guide CESM1.1.z User\u2019s Guide CESM User\u2019s Guide (CESM1.2 Release Series User\u2019s Guide) (PDF)","title":"CESM User Guides"},{"location":"clusters-at-yale/guides/cesm/#modules","text":"CESM 1.0.4, 1.2.2, 2.x are available on Grace. For CESM 2.1.0, load the following modules module load CESM/2.1.0-iomkl-2018a For older versions of CESM, you will need to use the old modules. These old version of CESM do not work with the new modules module use /vast/palmer/apps/old.grace/Modules module avail CESM Once you have located your module, run module load with the module name from above. With either module, the module will configure your environment with the Intel compiler, OpenMPI and NetCDF libraries as well as set the location of the Yale\u2019s repository of CESM input data. If you will be primarily using CESM, you can avoid rerunning the module load command every time you login by saving it to your default environment: module load module save","title":"Modules"},{"location":"clusters-at-yale/guides/cesm/#input-data","text":"To reduce the amount of data duplication on the cluster, we keep one centralized repository of CESM input data. The YCRC staff are only people who can add to that directory. If your build fails due to missing inputdata, contact us with your create_newcase line and we will download that data for you.","title":"Input Data"},{"location":"clusters-at-yale/guides/cesm/#run-cesm","text":"CESM needs to be rebuilt separately for each run. As a result, running CESM is more complicated than a standard piece of software where you would just run the executable.","title":"Run CESM"},{"location":"clusters-at-yale/guides/cesm/#create-your-case","text":"Each simulation is called a \u201ccase\u201d. Loading a CESM module will put the create_newcase script in your path, so you can call it as follows. This will create a directory with your case name, that we will refer to as $CASE through out the rest of the guide. create_newcase -case $CASE -compset = -res = -mach = cd $CASE The mach parameters for Grace is yalegrace for CESM 1.0.4 and gracempi for CESM 1.2.2 and CESM 2.x , respectively. For example create_newcase --case $CASE --compset = B1850 --res = f09_g17 --mach = gracempi cd $CASE","title":"Create Your Case"},{"location":"clusters-at-yale/guides/cesm/#setup-your-case","text":"If you are making any changes to the namelist files (such as increasing the duration of the simulation), do those before running the setup scripts below.","title":"Setup Your Case"},{"location":"clusters-at-yale/guides/cesm/#cesm-10x","text":"./configure -case","title":"CESM 1.0.X"},{"location":"clusters-at-yale/guides/cesm/#cesm-11x-and-cesm-12x","text":"./cesm_setup","title":"CESM 1.1.X and CESM 1.2.X"},{"location":"clusters-at-yale/guides/cesm/#cesm-2x","text":"./case.setup","title":"CESM 2.X"},{"location":"clusters-at-yale/guides/cesm/#build-your-case","text":"After you run the setup script, there will be a set of the scripts in your case directory that start with your case name. To compile your simulation executable, first move to an interactive job and then run the build script corresponding to your case. # CESM 1.x salloc -c 4 module load # = the appropriate module for your CESM version ./ $CASE . $mach .build # CESM 2.x salloc -c 4 module load # = the appropriate module for your CESM version ./case.build --skip-provenance-check Note the --skip-provenance-check flag is required with CESM 2.x due to the changes made to port the code to Grace. For more details on interactive jobs, see our Slurm documentation . During the build, CESM will create a corresponding directory in your scratch60 or project directory at ls ~/scratch60/CESM/$CASE This directory will contain all the outputs from your simulation as well as logs and the cesm.exe executable.","title":"Build Your Case"},{"location":"clusters-at-yale/guides/cesm/#common-build-issues","text":"Make sure you compile on an interactive node as described above. If you build fails, it will direct you to look in a bldlog file. If that log complains that it can\u2019t find mpirun, NetCDF or another library or executable, make sure you have the correct CESM module loaded. It can helpful to run module purge before the module load to ensure a reproducible environment. If you get an error saying ERROR: Error gathering provenance information from manage_externals , rerun the build using the suggested flag, e.g. ./case.build --skip-provenance-check .","title":"Common Build Issues"},{"location":"clusters-at-yale/guides/cesm/#submit-your-case","text":"Once the build is complete, which can take 5-15 minutes, you can submit your case with the submit script. # CESM 1.x ./ $CASE . $mach .submit # CESM 2.x ./case.submit For more details on monitoring your submitted jobs, see our Slurm documentation .","title":"Submit Your Case"},{"location":"clusters-at-yale/guides/cesm/#changing-slurm-partition","text":"In CESM 2.x, to change the partition in which your main jobs will run, use the following command: ./xmlchange JOB_QUEUE = scavenge --subgroup case .run The associated archive job will still be submitted to the day partition.","title":"Changing Slurm Partition"},{"location":"clusters-at-yale/guides/cesm/#troubleshoot-your-run","text":"If your run doesn\u2019t complete, there are a few places to look to identify the error. CESM writes to multiple log files for the different components and you will likely have to look in a few to find the root cause of your error.","title":"Troubleshoot Your Run"},{"location":"clusters-at-yale/guides/cesm/#slurm-log","text":"In your case directory, there will be a file that looks like slurm-.log . Check that file first to make sure the job started up properly. If the last few lines in the file redirect you to look at cpl.log. file in your scratch directory, see below. If there is another error, try to address it and resubmit.","title":"Slurm Log"},{"location":"clusters-at-yale/guides/cesm/#cesm-run-logs","text":"If the last few lines of the slurm log direct you to look at cpl.log. file, change directory to your case \u201crun\u201d directory (usually in your project directory): cd ~/project/CESM/ $CASE /run The pointer to the cpl file is often misleading as I have found the error is usually located in one of the other logs. Instead look in the cesm.log.xxxxxx file. Towards the end there may be an error or it may signify which component was running. Then look in the log corresponding to that component to track down the issue. One shortcut to finding the relevant logs is to sort the log files by the time to see which ones were last updated: ls -ltr *log* Look at the end of the last couple logs listed and look for an indication of the error.","title":"CESM Run Logs"},{"location":"clusters-at-yale/guides/cesm/#resolve-errors","text":"Once you have identified the lines in the logs corresponding to your error: If your log says something like Disk quota exceeded , your group is out of space in the fileset you are writing to. You can run the getquota script to get details on your disk usage. Your group will need to reduce their usage before you will be able to run successfully. If it looks like a model error and you don\u2019t know how to fix it, we strongly recommend Googling your error and/or looking in the CESM forums . If you are still experiencing issues, contact us .","title":"Resolve Errors"},{"location":"clusters-at-yale/guides/cesm/#alternative-submission-parameters","text":"By default, the submission script will submit to the \"mpi\" partition for 1 day.","title":"Alternative Submission Parameters"},{"location":"clusters-at-yale/guides/cesm/#cesm-1x","text":"To change this in CESM 1.x, edit your case\u2019s run script and change the partition and time. The maximum walltime in the mpi and scavenge partitions is 24 hours. For example: ## scavenge partition #SBATCH --partition=scavenge #SBATCH --time=1-","title":"CESM 1.x"},{"location":"clusters-at-yale/guides/cesm/#cesm-2x_1","text":"To change this in CESM 2.x, use ./xmlchange in your run directory. # Change partition to scavenge ./xmlchange JOB_QUEUE=scavenge # Change walltime limit to 2 days (> 24 hours is only available on PI partitions) ./xmlchange JOB_WALLCLOCK_TIME 2-00:00:00","title":"CESM 2.x"},{"location":"clusters-at-yale/guides/cesm/#further-reading","text":"We recommend referencing the User Guides listed at the top of this page. CESM User Forum Our Slurm Documentation CESM is a very widely used package, you can often find answers by simply using Google. Just make sure that the solutions you find correspond to the approximate version of CESM you are using. CESM changes in subtle but significant ways between versions.","title":"Further Reading"},{"location":"clusters-at-yale/guides/checkpointing/","text":"Checkpoint Long-running Jobs When working with long-running jobs and work-flows, it becomes very important to establish checkpoints along the way. This will ensure that if your job is interrupted you will be able to restart it without having to go back to the begining of the job. DMTCP \"Distributed Multithreaded Checkpointing\" allows you to easily save the state of your running job and restart it from that point. This can be very useful if your job fails for any number of reasons: it exceeds the time limit, is preempted from scavenge, the compute node crashes, etc. DMTCP does not require any changes to your code or recompilation. It should work on most sequential or multithreaded/multiprocessing programs as is. module load DMTCP Run Your Job Interactively Under DMTCP For this simple example, we'll use this python script count.py import time i = 0 while True : print ( i , flush = True ) i += 1 time . sleep ( 1 ) Run the script interactively using dmtcp_launch : dmtcp_launch -i 5 python3 count.py It will begin printing to the terminal. In the background, DMTCP will be writing a checkpoint file every 5 seconds. Let it count for a while, then kill it with Ctrl + c . If you look in that directory, you'll see several files related to DMTCP. The *.dmtcp file is the checkpoint file. To restart the job from the last checkpoint, do: dmtcp_restart -i 5 *.dmtcp In practice, you'll most likely want to use DMTCP to checkpoint batch jobs, rather than interactive sessions. Checkpoint a Batch Job This script will submit the job under DMTCP's checkpointing. Here we use a more reasonable checkpoint interval of 300 seconds. You will want to experiment to see how long it takes to write your application's checkpoint file, and tune your interval accordingly. #!/bin/bash module load DMTCP dmtcp_launch -i 300 python count.py Then, if the job fails, you can resubmit it with this script: #!/bin/bash module load DMTCP dmtcp_restart -i 300 *.dmtcp Note that we are using wildcards to name the DMTCP file, which will obviously only work correctly if there is only one checkpoint file in the directory. Alternatively you can edit the script each time and explicitly name the correct checkpoint file. Restart a Preempted job Here is an example job script that will start a job running, periodically checkpoint it, and automatically requeue the job if it is preempted: #!/bin/bash #SBATCH -t 30:00 #SBATCH --requeue #SBATCH --open-mode=append #edit following line to put the appropriate module module load DMTCP cnt = ${ SLURM_RESTART_COUNT :- 0 } echo \"SLURM_RESTART_COUNT = $cnt \" dmtcp_coordinator -i 5 --daemon --port 0 --port-file /tmp/port export DMTCP_COORD_PORT = $( 0 ]] ; then echo \"doing restart\" dmtcp_restart -j *.dmtcp else echo \"Failed to restart the job, exit\" ; exit fi Launch the job with sbatch, and watch the numbers appear in the slurm*.out file. Then, simulate preemption by doing: $ scontrol requeue 123456789 Because the script specified --requeue, the job will be returned to pending. Slurm automatically sets a \"Begin Time\" a couple of minutes in the future, so the job will pend until then, at which point it will begin running again, so be patient. This time the script will invoke dmtcp_restart, and will continue from the checkpoint. If you look at the output, you'll see from the numbers that the job backed up to the previous checkpoint and restarted. You can requeue the job several times, and each time it will restart from the last checkpoint. You should be able to adapt this script to your own job by loading any required modules and replacing \"python count.py\" with your program's invocation. This example is much more complicated than our previous examples. Some notes: DMTCP uses a \"controller\" to manage the checkpointing. In the simple example, dmtcp_launch transparently started a controller on the default port 7779. In this case, we explicitly start a \"controller\" on a random port and communicate the port number via an environment variable. This prevents collisions if multiple DMTCP sessions run on the same node. The -j flag to dmtcp_launch tells it to join the existing controller. On initial launch we remove existing checkpoint files. This may not be a good idea in practice. The env var SLURM_RESTART_COUNT is used to determine if this is a restart or not. Parallel Execution with DMTCP DMTCP can checkpoint multithreaded/multiprocess parallel applications. In this example we run NAMD (a molecular dynamics simulation), using 6 threads on 6 cpus. We also restart automatically on preemption, as above. #!/bin/bash #SBATCH -c 6 #SBATCH -t 30:00 #SBATCH --requeue #SBATCH --open-mode=append #SBATCH -C haswell #edit following line to put the appropriate module module load NAMD/2.12-multicore module load DMTCP cnt = ${ SLURM_RESTART_COUNT :- 0 } echo \"SLURM_RESTARTCOUNT = $cnt \" dmtcp_coordinator -i 90 --daemon --port 0 --port-file /tmp/port export DMTCP_COORD_HOST = ` hostname ` export DMTCP_COORD_PORT = $( 0 ]] ; then echo \"doing restart\" dmtcp_restart *.dmtcp else echo \"Failed to restart the job, exit\" ; exit fi Additional notes dmtcp reopens files when recovering from checkpoints, so most file writes should just work. However, when requeuing jobs as shown above, you should take care to do #SBATCH --open-mode=append keep in mind that recovery from checkpoints does imply backing up to the point of the previous checkpoint. If your program is continuously writing output, the output since the last checkpoint will be replicated. For many programs (like NAMD) the output is really just logging, so this is not a problem. by default, dmtcp compresses checkpoint files. For large files this can take a long time. You can turn off comporession with dmtcp_launch --no-gzip . dmtcp creates a convenience restart script called restart_dmtcp_script.sh with every checkpoint. In theory you can simply call it to restart: ./restart_dmtcp_script.sh rather than restart_dmtcp *.dmtcp However, we have found it to be unreliable. Your mileage may vary. The above examples just scratch the surface. For more information: A DMTCP quickstart and documentation A very helpful page at NERSC","title":"Checkpoint Long-running Jobs"},{"location":"clusters-at-yale/guides/checkpointing/#checkpoint-long-running-jobs","text":"When working with long-running jobs and work-flows, it becomes very important to establish checkpoints along the way. This will ensure that if your job is interrupted you will be able to restart it without having to go back to the begining of the job. DMTCP \"Distributed Multithreaded Checkpointing\" allows you to easily save the state of your running job and restart it from that point. This can be very useful if your job fails for any number of reasons: it exceeds the time limit, is preempted from scavenge, the compute node crashes, etc. DMTCP does not require any changes to your code or recompilation. It should work on most sequential or multithreaded/multiprocessing programs as is. module load DMTCP","title":"Checkpoint Long-running Jobs"},{"location":"clusters-at-yale/guides/checkpointing/#run-your-job-interactively-under-dmtcp","text":"For this simple example, we'll use this python script count.py import time i = 0 while True : print ( i , flush = True ) i += 1 time . sleep ( 1 ) Run the script interactively using dmtcp_launch : dmtcp_launch -i 5 python3 count.py It will begin printing to the terminal. In the background, DMTCP will be writing a checkpoint file every 5 seconds. Let it count for a while, then kill it with Ctrl + c . If you look in that directory, you'll see several files related to DMTCP. The *.dmtcp file is the checkpoint file. To restart the job from the last checkpoint, do: dmtcp_restart -i 5 *.dmtcp In practice, you'll most likely want to use DMTCP to checkpoint batch jobs, rather than interactive sessions.","title":"Run Your Job Interactively Under DMTCP"},{"location":"clusters-at-yale/guides/checkpointing/#checkpoint-a-batch-job","text":"This script will submit the job under DMTCP's checkpointing. Here we use a more reasonable checkpoint interval of 300 seconds. You will want to experiment to see how long it takes to write your application's checkpoint file, and tune your interval accordingly. #!/bin/bash module load DMTCP dmtcp_launch -i 300 python count.py Then, if the job fails, you can resubmit it with this script: #!/bin/bash module load DMTCP dmtcp_restart -i 300 *.dmtcp Note that we are using wildcards to name the DMTCP file, which will obviously only work correctly if there is only one checkpoint file in the directory. Alternatively you can edit the script each time and explicitly name the correct checkpoint file.","title":"Checkpoint a Batch Job"},{"location":"clusters-at-yale/guides/checkpointing/#restart-a-preempted-job","text":"Here is an example job script that will start a job running, periodically checkpoint it, and automatically requeue the job if it is preempted: #!/bin/bash #SBATCH -t 30:00 #SBATCH --requeue #SBATCH --open-mode=append #edit following line to put the appropriate module module load DMTCP cnt = ${ SLURM_RESTART_COUNT :- 0 } echo \"SLURM_RESTART_COUNT = $cnt \" dmtcp_coordinator -i 5 --daemon --port 0 --port-file /tmp/port export DMTCP_COORD_PORT = $( 0 ]] ; then echo \"doing restart\" dmtcp_restart -j *.dmtcp else echo \"Failed to restart the job, exit\" ; exit fi Launch the job with sbatch, and watch the numbers appear in the slurm*.out file. Then, simulate preemption by doing: $ scontrol requeue 123456789 Because the script specified --requeue, the job will be returned to pending. Slurm automatically sets a \"Begin Time\" a couple of minutes in the future, so the job will pend until then, at which point it will begin running again, so be patient. This time the script will invoke dmtcp_restart, and will continue from the checkpoint. If you look at the output, you'll see from the numbers that the job backed up to the previous checkpoint and restarted. You can requeue the job several times, and each time it will restart from the last checkpoint. You should be able to adapt this script to your own job by loading any required modules and replacing \"python count.py\" with your program's invocation. This example is much more complicated than our previous examples. Some notes: DMTCP uses a \"controller\" to manage the checkpointing. In the simple example, dmtcp_launch transparently started a controller on the default port 7779. In this case, we explicitly start a \"controller\" on a random port and communicate the port number via an environment variable. This prevents collisions if multiple DMTCP sessions run on the same node. The -j flag to dmtcp_launch tells it to join the existing controller. On initial launch we remove existing checkpoint files. This may not be a good idea in practice. The env var SLURM_RESTART_COUNT is used to determine if this is a restart or not.","title":"Restart a Preempted job"},{"location":"clusters-at-yale/guides/checkpointing/#parallel-execution-with-dmtcp","text":"DMTCP can checkpoint multithreaded/multiprocess parallel applications. In this example we run NAMD (a molecular dynamics simulation), using 6 threads on 6 cpus. We also restart automatically on preemption, as above. #!/bin/bash #SBATCH -c 6 #SBATCH -t 30:00 #SBATCH --requeue #SBATCH --open-mode=append #SBATCH -C haswell #edit following line to put the appropriate module module load NAMD/2.12-multicore module load DMTCP cnt = ${ SLURM_RESTART_COUNT :- 0 } echo \"SLURM_RESTARTCOUNT = $cnt \" dmtcp_coordinator -i 90 --daemon --port 0 --port-file /tmp/port export DMTCP_COORD_HOST = ` hostname ` export DMTCP_COORD_PORT = $( 0 ]] ; then echo \"doing restart\" dmtcp_restart *.dmtcp else echo \"Failed to restart the job, exit\" ; exit fi","title":"Parallel Execution with DMTCP"},{"location":"clusters-at-yale/guides/checkpointing/#additional-notes","text":"dmtcp reopens files when recovering from checkpoints, so most file writes should just work. However, when requeuing jobs as shown above, you should take care to do #SBATCH --open-mode=append keep in mind that recovery from checkpoints does imply backing up to the point of the previous checkpoint. If your program is continuously writing output, the output since the last checkpoint will be replicated. For many programs (like NAMD) the output is really just logging, so this is not a problem. by default, dmtcp compresses checkpoint files. For large files this can take a long time. You can turn off comporession with dmtcp_launch --no-gzip . dmtcp creates a convenience restart script called restart_dmtcp_script.sh with every checkpoint. In theory you can simply call it to restart: ./restart_dmtcp_script.sh rather than restart_dmtcp *.dmtcp However, we have found it to be unreliable. Your mileage may vary. The above examples just scratch the surface. For more information: A DMTCP quickstart and documentation A very helpful page at NERSC","title":"Additional notes"},{"location":"clusters-at-yale/guides/clustershell/","text":"ClusterShell ClusterShell is a useful Python package for executing arbitrary commands across multiple hosts. On the Yale clusters it provides a relatively simple way for you to run commands on nodes your jobs are running on, and collect the results. The two most useful commands provided are nodeset , which can show and manipulate node lists and clush , which can run commands on multiple nodes at once. Configuration To set up ClusterShell, make sure you have a .config directory and a copy our groups.conf file there. For more info about ClusterShell configuration for Slurm, see the official docs . mkdir -p ~/.config/clustershell wget https://docs.ycrc.yale.edu/_static/files/clustershell_groups.conf -O ~/.config/clustershell/groups.conf We provide ClusterShell as a module, but you can also install it with conda . Module module load ClusterShell Conda module load miniconda conda create -yn clustershell python pip source activate clustershell pip install ClusterShell Examples nodeset The nodeset command uses sinfo underneath but has slightly different syntax. You can use it to ask about node states and nodes your job is running on. The nice difference is you can ask for folded (e.g. c[01-02]n[12,15,18] ) or expanded (e.g. c01n01 c01n02 ... ) node lists. The groups useful to you that we have configured are @user , @job and @state . User group List expanded node names where user abc123 has jobs running # similar to squeue -h -u abc123 -o \"%N\" nodeset -e @user:abc123 Job group List folded nodes where job 1234567 is running # similar to squeue -h -j 1234567 -o \"%N\" nodeset -f @job:1234567 State group List expanded node names that are idle according to slurm # similar to sinfo -t IDLE -o \"%N\" nodeset -e @state:idle clush The clush command uses the node grouping syntax from nodeset to allow you to run commands on those nodes. clush uses ssh to connect to each of these nodes. You can use the -b option to gather output from nodes with same output into the same lines. Leaving this out will report on each node separately. Info You can only ssh to, and therefore run clush on, nodes where you have active jobs. Local storage Get a list of files in /tmp/abs on all nodes where job 654321 is running. clush -bw @job:654321 ls /tmp/abc123 # don't gather identical output clush -w @job:654321 ls /tmp/abc123 CPU usage Show %cpu, memory usage, and command for all nodes running any jobs owned by user abc123 . clush -bw @user:abc123 ps -uabc123 -o%cpu,rss,cmd GPU usage Show what's running on all the GPUs on the nodes associated with your job 654321 . clush -bw @job:654321 nvidia-smi --format = csv --query-compute-apps = process_name,used_gpu_memory","title":"ClusterShell"},{"location":"clusters-at-yale/guides/clustershell/#clustershell","text":"ClusterShell is a useful Python package for executing arbitrary commands across multiple hosts. On the Yale clusters it provides a relatively simple way for you to run commands on nodes your jobs are running on, and collect the results. The two most useful commands provided are nodeset , which can show and manipulate node lists and clush , which can run commands on multiple nodes at once.","title":"ClusterShell"},{"location":"clusters-at-yale/guides/clustershell/#configuration","text":"To set up ClusterShell, make sure you have a .config directory and a copy our groups.conf file there. For more info about ClusterShell configuration for Slurm, see the official docs . mkdir -p ~/.config/clustershell wget https://docs.ycrc.yale.edu/_static/files/clustershell_groups.conf -O ~/.config/clustershell/groups.conf We provide ClusterShell as a module, but you can also install it with conda .","title":"Configuration"},{"location":"clusters-at-yale/guides/clustershell/#module","text":"module load ClusterShell","title":"Module"},{"location":"clusters-at-yale/guides/clustershell/#conda","text":"module load miniconda conda create -yn clustershell python pip source activate clustershell pip install ClusterShell","title":"Conda"},{"location":"clusters-at-yale/guides/clustershell/#examples","text":"","title":"Examples"},{"location":"clusters-at-yale/guides/clustershell/#nodeset","text":"The nodeset command uses sinfo underneath but has slightly different syntax. You can use it to ask about node states and nodes your job is running on. The nice difference is you can ask for folded (e.g. c[01-02]n[12,15,18] ) or expanded (e.g. c01n01 c01n02 ... ) node lists. The groups useful to you that we have configured are @user , @job and @state .","title":"nodeset"},{"location":"clusters-at-yale/guides/clustershell/#user-group","text":"List expanded node names where user abc123 has jobs running # similar to squeue -h -u abc123 -o \"%N\" nodeset -e @user:abc123","title":"User group"},{"location":"clusters-at-yale/guides/clustershell/#job-group","text":"List folded nodes where job 1234567 is running # similar to squeue -h -j 1234567 -o \"%N\" nodeset -f @job:1234567","title":"Job group"},{"location":"clusters-at-yale/guides/clustershell/#state-group","text":"List expanded node names that are idle according to slurm # similar to sinfo -t IDLE -o \"%N\" nodeset -e @state:idle","title":"State group"},{"location":"clusters-at-yale/guides/clustershell/#clush","text":"The clush command uses the node grouping syntax from nodeset to allow you to run commands on those nodes. clush uses ssh to connect to each of these nodes. You can use the -b option to gather output from nodes with same output into the same lines. Leaving this out will report on each node separately. Info You can only ssh to, and therefore run clush on, nodes where you have active jobs.","title":"clush"},{"location":"clusters-at-yale/guides/clustershell/#local-storage","text":"Get a list of files in /tmp/abs on all nodes where job 654321 is running. clush -bw @job:654321 ls /tmp/abc123 # don't gather identical output clush -w @job:654321 ls /tmp/abc123","title":"Local storage"},{"location":"clusters-at-yale/guides/clustershell/#cpu-usage","text":"Show %cpu, memory usage, and command for all nodes running any jobs owned by user abc123 . clush -bw @user:abc123 ps -uabc123 -o%cpu,rss,cmd","title":"CPU usage"},{"location":"clusters-at-yale/guides/clustershell/#gpu-usage","text":"Show what's running on all the GPUs on the nodes associated with your job 654321 . clush -bw @job:654321 nvidia-smi --format = csv --query-compute-apps = process_name,used_gpu_memory","title":"GPU usage"},{"location":"clusters-at-yale/guides/cmd-line-args/","text":"Pass Values into Jobs A useful tool when running jobs on the clusters is to be able to pass variables into a script without modifying any code. This can include specifying the name of a data file to be processed, or setting a variable to a specific value. Generally, there are two ways of achieving this: environment variables and command-line arguments. Here we will work through how to implement these two approaches in both Python and R. Python Environment Variables In python, environment variables are accessed via the os package ( docs page ). In particular, we can use os.getenv to retrieve environment variables set prior to launching the python script. For example, consider a python script designed to process a data file: def file_cruncher ( file_name ): f = open ( file_name ) data = f . read () output = process ( data ) # processing code goes here return output We can use an environment variable ( INPUT_DATA_FILE ) to provide the filename of the data to be processed. The python script ( my_script.py ) is modified to retrieve this variable and analyze the given datafile: import os file_name = os . getenv ( \"INPUT_DATA_FILE\" ) def file_cruncher ( file_name ): f = open ( file_name ) data = f . read () output = process ( data ) # processing code goes here return output To process this data file, you would simply run: export INPUT_DATA_FILE = /path/to/file/input_0.dat python my_script.py This avoids having to modify the python script to change which datafile is processed, we only need to change the environment variable. Command-line Arguments Similarly, one can use command-line arguments to pass values into a script. In python, there are two main packages designed for handling arguments. First is the simple sys.argv function which parses command-line arguments into a list of strings: import sys for a in sys . argv : print ( a ) Running this with a few arguments: $ python my_script.py a b c my_script.py a b c The first element in sys.argv is the name of the script, and then all subsequent arguments follow. Secondly, there is the more fully-featured argparse package ( docs page )which offers many advanced tools to manage command-line arguments. Take a look at their documentation for examples of how to use argparse . R Just as with Python, R provides comparable utilities to access command-line arguments and environment variables. Environment Variables The Sys.getenv utility ( docs page ) works nearly identically to the Python implementation. > Sys.getenv ( 'HOSTNAME' ) [ 1 ] \"grace2.grace.hpc.yale.internal\" Just like Python, these values are always returned as string representations, so if the variable of interest is a number it will need to be cast into an integer using as.numeric() . Command-line Arguments To collect command-line arguments in R use the commandArgs function: args = commandArgs ( trailingOnly = TRUE ) for ( x in args ){ print ( x ) } The trailingOnly=TRUE option will limit args to contain only those arguments which follow the script: Rscript my_script.R a b c [ 1 ] \"a\" [ 1 ] \"b\" [ 1 ] \"c\" There is a more advanced and detailed package for managing command-line arguments called optparse ( docs page ). This can be used to create more featured scripts in a similar way to Python's argparse . Slurm Environment Variables Slurm sets a number of environment variables detailing the layout of every job. These include: SLURM_JOB_ID : the unique jobid given to each job. Useful to set unique output directories SLURM_CPUS_PER_TASK : the number of CPUs allocated for each task. Useful as a replacement for R's detectCores or Python's multiprocessing.cpu_count which report the physical number of CPUs and not the number allocated by Slurm. SLURM_ARRAY_TASK_ID : the unique array index for each element of a job array. Useful to un-roll a loop or to set a unique random seed for parallel simulations. These can be leveraged within batch scripts using the above techniques to either pass on the command-line or directly reading the environment variable to control how a script runs. For example, if a script previously looped over values ranging from 0-9, we can modify the script and create a job array which runs each iteration separately in parallel using SLURM_ARRAY_TASK_ID to tell each element of the job array which value to use.","title":"Pass Values into Jobs"},{"location":"clusters-at-yale/guides/cmd-line-args/#pass-values-into-jobs","text":"A useful tool when running jobs on the clusters is to be able to pass variables into a script without modifying any code. This can include specifying the name of a data file to be processed, or setting a variable to a specific value. Generally, there are two ways of achieving this: environment variables and command-line arguments. Here we will work through how to implement these two approaches in both Python and R.","title":"Pass Values into Jobs"},{"location":"clusters-at-yale/guides/cmd-line-args/#python","text":"","title":"Python"},{"location":"clusters-at-yale/guides/cmd-line-args/#environment-variables","text":"In python, environment variables are accessed via the os package ( docs page ). In particular, we can use os.getenv to retrieve environment variables set prior to launching the python script. For example, consider a python script designed to process a data file: def file_cruncher ( file_name ): f = open ( file_name ) data = f . read () output = process ( data ) # processing code goes here return output We can use an environment variable ( INPUT_DATA_FILE ) to provide the filename of the data to be processed. The python script ( my_script.py ) is modified to retrieve this variable and analyze the given datafile: import os file_name = os . getenv ( \"INPUT_DATA_FILE\" ) def file_cruncher ( file_name ): f = open ( file_name ) data = f . read () output = process ( data ) # processing code goes here return output To process this data file, you would simply run: export INPUT_DATA_FILE = /path/to/file/input_0.dat python my_script.py This avoids having to modify the python script to change which datafile is processed, we only need to change the environment variable.","title":"Environment Variables"},{"location":"clusters-at-yale/guides/cmd-line-args/#command-line-arguments","text":"Similarly, one can use command-line arguments to pass values into a script. In python, there are two main packages designed for handling arguments. First is the simple sys.argv function which parses command-line arguments into a list of strings: import sys for a in sys . argv : print ( a ) Running this with a few arguments: $ python my_script.py a b c my_script.py a b c The first element in sys.argv is the name of the script, and then all subsequent arguments follow. Secondly, there is the more fully-featured argparse package ( docs page )which offers many advanced tools to manage command-line arguments. Take a look at their documentation for examples of how to use argparse .","title":"Command-line Arguments"},{"location":"clusters-at-yale/guides/cmd-line-args/#r","text":"Just as with Python, R provides comparable utilities to access command-line arguments and environment variables.","title":"R"},{"location":"clusters-at-yale/guides/cmd-line-args/#environment-variables_1","text":"The Sys.getenv utility ( docs page ) works nearly identically to the Python implementation. > Sys.getenv ( 'HOSTNAME' ) [ 1 ] \"grace2.grace.hpc.yale.internal\" Just like Python, these values are always returned as string representations, so if the variable of interest is a number it will need to be cast into an integer using as.numeric() .","title":"Environment Variables"},{"location":"clusters-at-yale/guides/cmd-line-args/#command-line-arguments_1","text":"To collect command-line arguments in R use the commandArgs function: args = commandArgs ( trailingOnly = TRUE ) for ( x in args ){ print ( x ) } The trailingOnly=TRUE option will limit args to contain only those arguments which follow the script: Rscript my_script.R a b c [ 1 ] \"a\" [ 1 ] \"b\" [ 1 ] \"c\" There is a more advanced and detailed package for managing command-line arguments called optparse ( docs page ). This can be used to create more featured scripts in a similar way to Python's argparse .","title":"Command-line Arguments"},{"location":"clusters-at-yale/guides/cmd-line-args/#slurm-environment-variables","text":"Slurm sets a number of environment variables detailing the layout of every job. These include: SLURM_JOB_ID : the unique jobid given to each job. Useful to set unique output directories SLURM_CPUS_PER_TASK : the number of CPUs allocated for each task. Useful as a replacement for R's detectCores or Python's multiprocessing.cpu_count which report the physical number of CPUs and not the number allocated by Slurm. SLURM_ARRAY_TASK_ID : the unique array index for each element of a job array. Useful to un-roll a loop or to set a unique random seed for parallel simulations. These can be leveraged within batch scripts using the above techniques to either pass on the command-line or directly reading the environment variable to control how a script runs. For example, if a script previously looped over values ranging from 0-9, we can modify the script and create a job array which runs each iteration separately in parallel using SLURM_ARRAY_TASK_ID to tell each element of the job array which value to use.","title":"Slurm Environment Variables"},{"location":"clusters-at-yale/guides/comsol/","text":"COMSOL YCRC has COMSOL Multiphysics 5.2a available on Grace. It can be used to run basic physical and multiphysics models on one node utilizing multiple cores. If you need to run run models across multiple nodes or need to run COMSOL on your local machine, please contact us . Use COMSOL To use COMSOL on the cluster, load the COMSOL module by running module load COMSOL/5.2a-classkit . For more information on our modules, please see our software modules documentation. COMSOL has a resource intenstive GUI and, therefore, we strongly recommend using COMSOL in a Remote Desktop session on the Open OnDemand web portal . To launch COMSOL in your Remote Desktop, open the terminal application in the session and enter the following commands: module load COMSOL/5.2a-classkit comsol -np $SLURM_CPUS_ON_NODE & Run COMSOL in Batch Mode Comsol can be run without the graphical interface assuming you have a model file and a study defined beforehand. This is particularly useful for parametric sweeps or scanning over a range of values for specific parameters. For example: comsol batch -configuration /tmp -data /tmp -prefsdir /tmp -inputfile mymodel.mph -outputfile out.mph -study std1 which will run the study std1 found within the mymodel.mph file generated through the COMSOL GUI and save the outputs in out.mph . A parameter can be passed into the study like this: comsol batch -inputfile mymodel.mph -outputfile out.mph -pname L -plist 8[cm],10[cm],12[cm] Which will run three versions of the model sequentially for each of the three values of L enumerated. When combined with Slurm Job Arrays many COMSOL jobs can be run in parallel. An example dSQ job-file would look like: module load COMSOL ; comsol batch -inputfile mymodel.mph -outputfile out_8.mph -pname L -plist 8 [ cm ] module load COMSOL ; comsol batch -inputfile mymodel.mph -outputfile out_10.mph -pname L -plist 10 [ cm ] module load COMSOL ; comsol batch -inputfile mymodel.mph -outputfile out_12.mph -pname L -plist 12 [ cm ] Which would run three versions of the study using different values of L and save their outputs in separate files. Be careful to provide a different output file for each line to avoid clashes between the separate jobs. More details can be found on the COMSOL documentation site . Details of COMSOL on YCRC Clusters Two COMSOL modules (Heat Transfer and Structural Mechanics) are included in addition to the main multiphysics engine. The following models might be solved using our COMSOL package both in stationary and time dependent studies. AC/DC. Electric Currents and Electrostatics in 1D, 2D, 3D models. Magnetic Fields in 2D. Acoustics. Pressure acoustics in frequency domain in 1D, 2D, 3D models. Chemical Species Transport. Transport of Diluted Species in 1D, 2D, 3D models. Transport and reactions of the species dissolved in a gas, liquid, or solid can be handled with this interface. The driving forces for transport can be diffusion, convection when coupled to a flow field, and migration, when coupled to an electric field. Moisture Transport in 1D, 2D, 3D is used to model moisture transfer in a porous medium. Fluid Flow. Single Phase Laminar and Turbulent Flow including non-isothermal flow in 2D, 3D models. Fluid-Structure Interaction in 2D, 3D models for both fixed geometry and deformed solid. Heat Transfer in 1D, 2D, 3D models. HT in Solids and Fluids. HT in porous media including non-equilibrium transfer. Bioheat transfer. Surface to Surface Radiation. Joule Heating. HT in thin structures (2D, 3D) like shells, films, fractures. Conjugate HT from laminar and turbulent flows (2D, 3D). Heat and moisture transport. Thermoelastic effect. Plasma in 1D. Equilibrium DC Discharges that are sustained by a static or slow-varying electric field where induction currents and fluid flow effects are negligible. Structural Mechanics in 2D, 3D models. Solid Mechanics (elastic). Plate Truss in 2D. Beam, Truss (2D, 3D). Membrane (2D axisymmetric, 3D). Shell (3D). Thermal stress. Thermal expansion. Piezoelectricity. General Mathematics equations in 1D, 2D, 3D models. Classic PDE. Coefficient based and general form PDE. Wave form PDE. Weak form PDE. Ordinary differential equations and algebraic equations. Deformed geometry and moving mesh. Curvilinear coordinates. All above models can be used in the Multiphysics approach of coupling them together. They can be solved in Full Couple mode or by using Segregated Solver (solving one physical model and using resulting field to model another, and so on). Backward Compatibility COMSOL is not backwards compatible. If you have a project file from a newer version of COMSOL (e.g. 5.3), it will not open in 5.2a. However, in some circumstances, we can assist with porting the project file back to version 5.2a. If you have any questions about this, please contact us . Limitations of Available License Please note that some commonly used COMSOL features such as CAD Import Module, Material Library, and MatLab Link are not included in the license. COMSOL Material Library consists of about 2500 different materials with their physical properties. Many of them are included with temperature dependancies. Without this library you have to specify material parameters manually, however, you can save your new material for future use. We can help in adding material form COMSOL library to your project file using a different license. You cannot import geometry designed by external CAD program like SolidWorks, Autocad, etc. Instead you have to design it inside COMSOL. However, we can help you to perform such import utilizing different license; we\u2019ll save it in COMSOL project file and you would be able to open it with already imported geometry. More advanced users often use MatLab for automation of COMSOL models and extracting results data for mining them by external methods available in MatLab. Unfortunately, you cannot do this with the license available on the cluster. Please contact us if you feel you need to utilize MatLab. Lastly, our license does not allow to use COMSOL for solving models based on Maxwell Equations (RF, Wave Optics), semiconductor models, particle tracing, ray optics, non-linear mechanics, and some other more advanced modules. To approach such models in COMSOL on your local computer, please contact us to use our more general license with very limited number of licensed seats.","title":"COMSOL"},{"location":"clusters-at-yale/guides/comsol/#comsol","text":"YCRC has COMSOL Multiphysics 5.2a available on Grace. It can be used to run basic physical and multiphysics models on one node utilizing multiple cores. If you need to run run models across multiple nodes or need to run COMSOL on your local machine, please contact us .","title":"COMSOL"},{"location":"clusters-at-yale/guides/comsol/#use-comsol","text":"To use COMSOL on the cluster, load the COMSOL module by running module load COMSOL/5.2a-classkit . For more information on our modules, please see our software modules documentation. COMSOL has a resource intenstive GUI and, therefore, we strongly recommend using COMSOL in a Remote Desktop session on the Open OnDemand web portal . To launch COMSOL in your Remote Desktop, open the terminal application in the session and enter the following commands: module load COMSOL/5.2a-classkit comsol -np $SLURM_CPUS_ON_NODE &","title":"Use COMSOL"},{"location":"clusters-at-yale/guides/comsol/#run-comsol-in-batch-mode","text":"Comsol can be run without the graphical interface assuming you have a model file and a study defined beforehand. This is particularly useful for parametric sweeps or scanning over a range of values for specific parameters. For example: comsol batch -configuration /tmp -data /tmp -prefsdir /tmp -inputfile mymodel.mph -outputfile out.mph -study std1 which will run the study std1 found within the mymodel.mph file generated through the COMSOL GUI and save the outputs in out.mph . A parameter can be passed into the study like this: comsol batch -inputfile mymodel.mph -outputfile out.mph -pname L -plist 8[cm],10[cm],12[cm] Which will run three versions of the model sequentially for each of the three values of L enumerated. When combined with Slurm Job Arrays many COMSOL jobs can be run in parallel. An example dSQ job-file would look like: module load COMSOL ; comsol batch -inputfile mymodel.mph -outputfile out_8.mph -pname L -plist 8 [ cm ] module load COMSOL ; comsol batch -inputfile mymodel.mph -outputfile out_10.mph -pname L -plist 10 [ cm ] module load COMSOL ; comsol batch -inputfile mymodel.mph -outputfile out_12.mph -pname L -plist 12 [ cm ] Which would run three versions of the study using different values of L and save their outputs in separate files. Be careful to provide a different output file for each line to avoid clashes between the separate jobs. More details can be found on the COMSOL documentation site .","title":"Run COMSOL in Batch Mode"},{"location":"clusters-at-yale/guides/comsol/#details-of-comsol-on-ycrc-clusters","text":"Two COMSOL modules (Heat Transfer and Structural Mechanics) are included in addition to the main multiphysics engine. The following models might be solved using our COMSOL package both in stationary and time dependent studies. AC/DC. Electric Currents and Electrostatics in 1D, 2D, 3D models. Magnetic Fields in 2D. Acoustics. Pressure acoustics in frequency domain in 1D, 2D, 3D models. Chemical Species Transport. Transport of Diluted Species in 1D, 2D, 3D models. Transport and reactions of the species dissolved in a gas, liquid, or solid can be handled with this interface. The driving forces for transport can be diffusion, convection when coupled to a flow field, and migration, when coupled to an electric field. Moisture Transport in 1D, 2D, 3D is used to model moisture transfer in a porous medium. Fluid Flow. Single Phase Laminar and Turbulent Flow including non-isothermal flow in 2D, 3D models. Fluid-Structure Interaction in 2D, 3D models for both fixed geometry and deformed solid. Heat Transfer in 1D, 2D, 3D models. HT in Solids and Fluids. HT in porous media including non-equilibrium transfer. Bioheat transfer. Surface to Surface Radiation. Joule Heating. HT in thin structures (2D, 3D) like shells, films, fractures. Conjugate HT from laminar and turbulent flows (2D, 3D). Heat and moisture transport. Thermoelastic effect. Plasma in 1D. Equilibrium DC Discharges that are sustained by a static or slow-varying electric field where induction currents and fluid flow effects are negligible. Structural Mechanics in 2D, 3D models. Solid Mechanics (elastic). Plate Truss in 2D. Beam, Truss (2D, 3D). Membrane (2D axisymmetric, 3D). Shell (3D). Thermal stress. Thermal expansion. Piezoelectricity. General Mathematics equations in 1D, 2D, 3D models. Classic PDE. Coefficient based and general form PDE. Wave form PDE. Weak form PDE. Ordinary differential equations and algebraic equations. Deformed geometry and moving mesh. Curvilinear coordinates. All above models can be used in the Multiphysics approach of coupling them together. They can be solved in Full Couple mode or by using Segregated Solver (solving one physical model and using resulting field to model another, and so on).","title":"Details of COMSOL on YCRC Clusters"},{"location":"clusters-at-yale/guides/comsol/#backward-compatibility","text":"COMSOL is not backwards compatible. If you have a project file from a newer version of COMSOL (e.g. 5.3), it will not open in 5.2a. However, in some circumstances, we can assist with porting the project file back to version 5.2a. If you have any questions about this, please contact us .","title":"Backward Compatibility"},{"location":"clusters-at-yale/guides/comsol/#limitations-of-available-license","text":"Please note that some commonly used COMSOL features such as CAD Import Module, Material Library, and MatLab Link are not included in the license. COMSOL Material Library consists of about 2500 different materials with their physical properties. Many of them are included with temperature dependancies. Without this library you have to specify material parameters manually, however, you can save your new material for future use. We can help in adding material form COMSOL library to your project file using a different license. You cannot import geometry designed by external CAD program like SolidWorks, Autocad, etc. Instead you have to design it inside COMSOL. However, we can help you to perform such import utilizing different license; we\u2019ll save it in COMSOL project file and you would be able to open it with already imported geometry. More advanced users often use MatLab for automation of COMSOL models and extracting results data for mining them by external methods available in MatLab. Unfortunately, you cannot do this with the license available on the cluster. Please contact us if you feel you need to utilize MatLab. Lastly, our license does not allow to use COMSOL for solving models based on Maxwell Equations (RF, Wave Optics), semiconductor models, particle tracing, ray optics, non-linear mechanics, and some other more advanced modules. To approach such models in COMSOL on your local computer, please contact us to use our more general license with very limited number of licensed seats.","title":"Limitations of Available License"},{"location":"clusters-at-yale/guides/conda/","text":"Conda Conda is a package, dependency, and environment manager. It allows you to maintain different, often incompatible, sets of applications side-by-side. It has become a popular choice for managing pipelines that involve several tools, especially when multiple languages are involved. These sets of applications and their dependencies are kept in Conda environments, which you can switch between as your work dictates. Compared to the modules that we provide, there are often newer and more varied packages available that you can manage yourself, but they may not be as well optimized for the clusters. See Conda's official command-line reference and the offical docs for managing environments for detailed instructions. Here we present essential instructions and site-specific info. Warning Mixing modules and conda-managed software is almost never a good idea. When constructing an environment for your work you should load either modules or a conda environment. If you get stuck, you can always ask us for help . The Miniconda Module For your convenience, we provide a relatively recent version of Miniconda as a module. This is a read-only environment from which you can create your own. We set some defaults for you in this module, and we keep it relatively up-to-date so you don't have to. If you are using Conda-installed packages, this should be the only module you load in your jobs. Note: If you are on Milgram and run out of space in your home directory for Conda, you can either reinstall your environment in your project space (see below) or contact us for help with your home quota. Defaults We Set On all clusters, we set the CONDA_ENVS_PATH and CONDA_PKGS_DIRS environment variables to conda_envs and conda_pkgs in your project directory where there is more quota available. Conda will install to and search in these directories for environments and cached packages. Starting with minconda module version 4.8.3 we set the default channels (the sources to find packages) to conda-forge and bioconda , which provide a wider array of packages than the default channels do. We have found it saves a lot of typing. If you would like to override these defaults, see the Conda docs on managing channels. Below is the .condarc for the miniconda module. env_prompt : '({name})' auto_activate_base : false channels : - conda-forge - bioconda - defaults Setup Your Environment Load the miniconda Module module load miniconda You can save this to your default module collection by using module save . See our module documentation for more details. Create a conda Environment To create an environment use the conda create command. Environment files are saved to the first path in $CONDA_ENVS_PATH , or where you specify with the --prefix option. You should give your environments names that are meaningful to you, so you can more easily keep track of their purposes. Because dependency resolution is hard and messy, we find specifying as many packages as possible at environment creation time can help minimize broken dependencies. Although sometimes unavoidable for Python, we recommend against heavily mixing the use of conda and pip to install applications. If needed, try to get as much installed with conda , then use pip to get the rest of the way to your desired environment. Tip For added reproducibility and control, specify versions of packages to be installed using conda with packagename=version syntax. E.g. numpy=1.14 For example, if you have a legacy application that needs Python 2 and OpenBLAS: module load miniconda conda create -n legacy_application python = 2 .7 openblas If you want a good starting point for interactive data science in R/Python Jupyter Notebooks: module load miniconda conda create -n ds_notebook python numpy scipy pandas matplotlib ipython jupyter r-irkernel r-ggplot2 r-tidyverse Note that you can also install jupyterlab instead of, or alongside jupyter. Conda Channels Community-lead collections of packages that you can install with conda are provided with channels. Some labs will provide their own software using this method. A few popular examples are Conda Forge and Bioconda , which we set for you by default. See the Conda docs for more info about managing channels. You can create a new environment called brian2 (specified with the -n option) and install Brian2 into it with the following: module load miniconda conda create -n brian2 brian2 # normally you would need this: # conda create -n brian2 --channel conda-forge brian2 You can also install packages from Bioconda, for example: module load miniconda conda create -n bioinfo biopython bedtools bowtie2 repeatmasker # normally you would need this: # conda create -n bioinfo --channel conda-forge --channel bioconda biopython bedtools bowtie2 repeatmasker Mamba: The Conda Alternative For complicated environments, conda can often strugle to \"solve\" the required set of packages in a reasonable time. An alternative tool, called mamba , has been developed, bringing a faster dependency solver based on libsolv , which is used in modern RPM package managers. mamba is a drop-in replacement for conda and environments can be created or new packages installed in the same way as with conda : module load miniconda # create new environment mamba create --name env_name python numpy pandas jupyter # install new pacakge into existing environment conda activate env_name mamba install scipy scikit-learn The mamba utility is installed in the YCRC base environment and is available for general use. For more details, see the Mamba GitHub page . Use Your Environment To use the applications in your environment, run the following: module load miniconda conda activate env_name Warning We recommend against putting source activate or conda activate commands in your ~/.bashrc file. This can lead to issues in interactive or batch jobs. If you have issues with an environment, trying re-loading the environment by calling conda deactivate before rerunning conda activate env_name . Interactive Your Conda environments will not follow you into job allocations. Make sure to activate them after your interactive job begins. In a Job Script To make sure that you are running in your project environment in a submission script, make sure to include the following lines in your submission script before running any other commands or scripts (but after your Slurm directives ): #!/bin/bash #SBATCH --partition=general #SBATCH --job-name=my_conda_job #SBATCH --cpus-per-task 4 #SBATCH --mem-per-cpu=6000 module load miniconda conda activate env_name python analyses.py Find and Install Additional Packages You can search Anaconda Cloud or use conda search to find the names of packages you would like to install: module load miniconda conda search numpy Compiling Codes You may need to compile codes in a conda environment, for example, installing an R package in a conda R env. This requires you to have the GNU C compiler and its development libraries installed in the conda env before compiling any codes: conda install gcc_linux-64 Without gcc_linux-64 , the code will be compiled using the system compiler and libraries. You will experience run-time errors when running the code in the conda environment. Troubleshoot Conda version doesn't match the module loaded If you have run conda init in the past, you may be locked to an old version of conda . You can run the following to fix this: sed -i.bak -ne '/# >>> conda init/,/# <<< conda init/!p' ~/.bashrc Permission Denied If you get a permission denied error when running conda install or pip install for a package, make sure you have created an environment and activated it or activated an existing one first. bash: conda: No such file or directory If you get the above error, it is likely that you don't have the necessary module file loaded. Try loading the minconda module and rerunning your conda activate env_name command. Could not find environment This error means that the version of miniconda you have loaded doesn't recognize the environment name you have supplied. Make sure you have the miniconda module loaded (and not a different Python module) and have previously created this environment. You can see a list of previously created environments by running: module load miniconda conda info --envs Additional Conda Commands List Installed Packages module load miniconda conda list --name env_name Delete a Conda Environment module load miniconda conda remove --name env_name --all Save and Export Environments There are two concepts for rebuilding conda environments: a copy of an existing environment, with identical versions of each package a fresh build following the same steps taken to creat the first environment (letting unspecified versions float) This short doc will walk through recommended approaches to both styles of exporting and rebuilding a generic environment named test containing python, jupyter, numpy, and scipy. Full Export Including Dependencies To export the exact versions of each package installed (including all dependencies) run: module load miniconda conda env export --no-builds --name test | grep -v prefix > test_export.yaml This yaml file is ~230 lines long and contains every package that is installed in the test environment. The conda export command includes information about the path where it was installed (i.e. the prefix ). To remove this hard-coded path, we need to remove the line in this print out related to the \"prefix\". Export Only Specified Packages If we simply wish to rebuild the environment using the steps previously employed to create it, we can replace --no-builds with --from-history . module load miniconda conda env export --from-history --name test | grep -v prefix > test_export.yaml This is a much smaller file, ~10 lines, and only lists the packages explicitly installed: name: test channels: - conda-forge - defaults - bioconda dependencies: - scipy - numpy=1.21 - jupyter - python=3.8 In this environment, the versions of python and numpy were pinned during installation, but scipy and jupyter were left to get the most recent compatible version. Build a New Environment To create a new environment using all the enumerated pacakges: module load miniconda conda env create --file test_export.yaml This will create a new environment with the same name test . The yaml file can be edited to change the name of the new environment.","title":"Conda"},{"location":"clusters-at-yale/guides/conda/#conda","text":"Conda is a package, dependency, and environment manager. It allows you to maintain different, often incompatible, sets of applications side-by-side. It has become a popular choice for managing pipelines that involve several tools, especially when multiple languages are involved. These sets of applications and their dependencies are kept in Conda environments, which you can switch between as your work dictates. Compared to the modules that we provide, there are often newer and more varied packages available that you can manage yourself, but they may not be as well optimized for the clusters. See Conda's official command-line reference and the offical docs for managing environments for detailed instructions. Here we present essential instructions and site-specific info. Warning Mixing modules and conda-managed software is almost never a good idea. When constructing an environment for your work you should load either modules or a conda environment. If you get stuck, you can always ask us for help .","title":"Conda"},{"location":"clusters-at-yale/guides/conda/#the-miniconda-module","text":"For your convenience, we provide a relatively recent version of Miniconda as a module. This is a read-only environment from which you can create your own. We set some defaults for you in this module, and we keep it relatively up-to-date so you don't have to. If you are using Conda-installed packages, this should be the only module you load in your jobs. Note: If you are on Milgram and run out of space in your home directory for Conda, you can either reinstall your environment in your project space (see below) or contact us for help with your home quota.","title":"The Miniconda Module"},{"location":"clusters-at-yale/guides/conda/#defaults-we-set","text":"On all clusters, we set the CONDA_ENVS_PATH and CONDA_PKGS_DIRS environment variables to conda_envs and conda_pkgs in your project directory where there is more quota available. Conda will install to and search in these directories for environments and cached packages. Starting with minconda module version 4.8.3 we set the default channels (the sources to find packages) to conda-forge and bioconda , which provide a wider array of packages than the default channels do. We have found it saves a lot of typing. If you would like to override these defaults, see the Conda docs on managing channels. Below is the .condarc for the miniconda module. env_prompt : '({name})' auto_activate_base : false channels : - conda-forge - bioconda - defaults","title":"Defaults We Set"},{"location":"clusters-at-yale/guides/conda/#setup-your-environment","text":"","title":"Setup Your Environment"},{"location":"clusters-at-yale/guides/conda/#load-the-miniconda-module","text":"module load miniconda You can save this to your default module collection by using module save . See our module documentation for more details.","title":"Load the miniconda Module"},{"location":"clusters-at-yale/guides/conda/#create-a-conda-environment","text":"To create an environment use the conda create command. Environment files are saved to the first path in $CONDA_ENVS_PATH , or where you specify with the --prefix option. You should give your environments names that are meaningful to you, so you can more easily keep track of their purposes. Because dependency resolution is hard and messy, we find specifying as many packages as possible at environment creation time can help minimize broken dependencies. Although sometimes unavoidable for Python, we recommend against heavily mixing the use of conda and pip to install applications. If needed, try to get as much installed with conda , then use pip to get the rest of the way to your desired environment. Tip For added reproducibility and control, specify versions of packages to be installed using conda with packagename=version syntax. E.g. numpy=1.14 For example, if you have a legacy application that needs Python 2 and OpenBLAS: module load miniconda conda create -n legacy_application python = 2 .7 openblas If you want a good starting point for interactive data science in R/Python Jupyter Notebooks: module load miniconda conda create -n ds_notebook python numpy scipy pandas matplotlib ipython jupyter r-irkernel r-ggplot2 r-tidyverse Note that you can also install jupyterlab instead of, or alongside jupyter.","title":"Create a conda Environment"},{"location":"clusters-at-yale/guides/conda/#conda-channels","text":"Community-lead collections of packages that you can install with conda are provided with channels. Some labs will provide their own software using this method. A few popular examples are Conda Forge and Bioconda , which we set for you by default. See the Conda docs for more info about managing channels. You can create a new environment called brian2 (specified with the -n option) and install Brian2 into it with the following: module load miniconda conda create -n brian2 brian2 # normally you would need this: # conda create -n brian2 --channel conda-forge brian2 You can also install packages from Bioconda, for example: module load miniconda conda create -n bioinfo biopython bedtools bowtie2 repeatmasker # normally you would need this: # conda create -n bioinfo --channel conda-forge --channel bioconda biopython bedtools bowtie2 repeatmasker","title":"Conda Channels"},{"location":"clusters-at-yale/guides/conda/#mamba-the-conda-alternative","text":"For complicated environments, conda can often strugle to \"solve\" the required set of packages in a reasonable time. An alternative tool, called mamba , has been developed, bringing a faster dependency solver based on libsolv , which is used in modern RPM package managers. mamba is a drop-in replacement for conda and environments can be created or new packages installed in the same way as with conda : module load miniconda # create new environment mamba create --name env_name python numpy pandas jupyter # install new pacakge into existing environment conda activate env_name mamba install scipy scikit-learn The mamba utility is installed in the YCRC base environment and is available for general use. For more details, see the Mamba GitHub page .","title":"Mamba: The Conda Alternative"},{"location":"clusters-at-yale/guides/conda/#use-your-environment","text":"To use the applications in your environment, run the following: module load miniconda conda activate env_name Warning We recommend against putting source activate or conda activate commands in your ~/.bashrc file. This can lead to issues in interactive or batch jobs. If you have issues with an environment, trying re-loading the environment by calling conda deactivate before rerunning conda activate env_name .","title":"Use Your Environment"},{"location":"clusters-at-yale/guides/conda/#interactive","text":"Your Conda environments will not follow you into job allocations. Make sure to activate them after your interactive job begins.","title":"Interactive"},{"location":"clusters-at-yale/guides/conda/#in-a-job-script","text":"To make sure that you are running in your project environment in a submission script, make sure to include the following lines in your submission script before running any other commands or scripts (but after your Slurm directives ): #!/bin/bash #SBATCH --partition=general #SBATCH --job-name=my_conda_job #SBATCH --cpus-per-task 4 #SBATCH --mem-per-cpu=6000 module load miniconda conda activate env_name python analyses.py","title":"In a Job Script"},{"location":"clusters-at-yale/guides/conda/#find-and-install-additional-packages","text":"You can search Anaconda Cloud or use conda search to find the names of packages you would like to install: module load miniconda conda search numpy","title":"Find and Install Additional Packages"},{"location":"clusters-at-yale/guides/conda/#compiling-codes","text":"You may need to compile codes in a conda environment, for example, installing an R package in a conda R env. This requires you to have the GNU C compiler and its development libraries installed in the conda env before compiling any codes: conda install gcc_linux-64 Without gcc_linux-64 , the code will be compiled using the system compiler and libraries. You will experience run-time errors when running the code in the conda environment.","title":"Compiling Codes"},{"location":"clusters-at-yale/guides/conda/#troubleshoot","text":"","title":"Troubleshoot"},{"location":"clusters-at-yale/guides/conda/#conda-version-doesnt-match-the-module-loaded","text":"If you have run conda init in the past, you may be locked to an old version of conda . You can run the following to fix this: sed -i.bak -ne '/# >>> conda init/,/# <<< conda init/!p' ~/.bashrc","title":"Conda version doesn't match the module loaded"},{"location":"clusters-at-yale/guides/conda/#permission-denied","text":"If you get a permission denied error when running conda install or pip install for a package, make sure you have created an environment and activated it or activated an existing one first.","title":"Permission Denied"},{"location":"clusters-at-yale/guides/conda/#bash-conda-no-such-file-or-directory","text":"If you get the above error, it is likely that you don't have the necessary module file loaded. Try loading the minconda module and rerunning your conda activate env_name command.","title":"bash: conda: No such file or directory"},{"location":"clusters-at-yale/guides/conda/#could-not-find-environment","text":"This error means that the version of miniconda you have loaded doesn't recognize the environment name you have supplied. Make sure you have the miniconda module loaded (and not a different Python module) and have previously created this environment. You can see a list of previously created environments by running: module load miniconda conda info --envs","title":"Could not find environment"},{"location":"clusters-at-yale/guides/conda/#additional-conda-commands","text":"","title":"Additional Conda Commands"},{"location":"clusters-at-yale/guides/conda/#list-installed-packages","text":"module load miniconda conda list --name env_name","title":"List Installed Packages"},{"location":"clusters-at-yale/guides/conda/#delete-a-conda-environment","text":"module load miniconda conda remove --name env_name --all","title":"Delete a Conda Environment"},{"location":"clusters-at-yale/guides/conda/#save-and-export-environments","text":"There are two concepts for rebuilding conda environments: a copy of an existing environment, with identical versions of each package a fresh build following the same steps taken to creat the first environment (letting unspecified versions float) This short doc will walk through recommended approaches to both styles of exporting and rebuilding a generic environment named test containing python, jupyter, numpy, and scipy.","title":"Save and Export Environments"},{"location":"clusters-at-yale/guides/conda/#full-export-including-dependencies","text":"To export the exact versions of each package installed (including all dependencies) run: module load miniconda conda env export --no-builds --name test | grep -v prefix > test_export.yaml This yaml file is ~230 lines long and contains every package that is installed in the test environment. The conda export command includes information about the path where it was installed (i.e. the prefix ). To remove this hard-coded path, we need to remove the line in this print out related to the \"prefix\".","title":"Full Export Including Dependencies"},{"location":"clusters-at-yale/guides/conda/#export-only-specified-packages","text":"If we simply wish to rebuild the environment using the steps previously employed to create it, we can replace --no-builds with --from-history . module load miniconda conda env export --from-history --name test | grep -v prefix > test_export.yaml This is a much smaller file, ~10 lines, and only lists the packages explicitly installed: name: test channels: - conda-forge - defaults - bioconda dependencies: - scipy - numpy=1.21 - jupyter - python=3.8 In this environment, the versions of python and numpy were pinned during installation, but scipy and jupyter were left to get the most recent compatible version.","title":"Export Only Specified Packages"},{"location":"clusters-at-yale/guides/conda/#build-a-new-environment","text":"To create a new environment using all the enumerated pacakges: module load miniconda conda env create --file test_export.yaml This will create a new environment with the same name test . The yaml file can be edited to change the name of the new environment.","title":"Build a New Environment"},{"location":"clusters-at-yale/guides/containers/","text":"Containers Warning The Singularity project has been renamed Apptainer . Everything should still work the same, including the 'singularity' command. If you find it not working as expected, please contact us . Apptainer (formerly Singularity) is a Linux container technology that is well suited to use in shared-user environments such as the clusters we maintain at Yale. It is similar to Docker ; You can bring with you a stack of software, libraries, and a Linux operating system that is independent of the host computer you run the container on. This can be very useful if you want to share your software environment with other researchers or yourself across several computers. Because Apptainer containers run as the user that started them and mount home directories by default, you can usually see the data you're interested in working on that is stored on a host computer without any extra work. Below we will outline some common use cases covering the creation and use of containers. There is also excellent documentation available on the full and official user guide for Apptainer . We are happy to help, just contact us with your questions. Warning On the Yale clusters, Apptainer is not installed on login nodes. You will need to run it from compute nodes. Apptainer Containers Images are the file(s) you use to run your container. Apptainer images are single files that usually end in .sif and are read-only by default, meaning changes you make to the environment inside the container are not persistent. Use a Pre-existing Container If someone has already built a container that suits your needs, you can use it directly. Apptainer images are single files that can be transferred to the clusters. You can fetch images from container registries such as Docker Hub or NVidia Container Registry . Container images can take up a lot of disk space (dozens of gigabytes), so you may want to change the default location Apptainer uses to cache these files. To do this before getting started, you should add something like the example below to to your ~/.bashrc file: # set APPTAINER_CACHEDIR if you want to pull files (which can get big) somewhere other than $HOME/.apptainer # e.g. export APPTAINER_CACHEDIR = ~/scratch60/.apptainer Here are some examples of getting containers already built by someone else with apptainer: # from Docker Hub (https://hub.docker.com/) apptainer build ubuntu-18.10.sif docker://ubuntu:18.10 apptainer build tensorflow-10.0-py3.sif docker://tensorflow/tensorflow:1.10.0-py3 # from Singularity Hub (no longer updated) apptainer build bioconvert-latest.sif shub://biokit/bioconvert:latest Build Your Own Container You can define a container image to be exactly how you want/need it to be, including applications, libraries, and files of your choosing with a definition file . Apptainer definition files are similar to Docker's Dockerfile , but use different syntax. For full definition files and more documentation please see the Apptainer site . Header Every container definition must begin with a header that defines what image to start with, or bootstrap from. This can be an official Linux distribution or someone else's container that gets you nearly what you want. To start from Ubuntu Bionic Beaver (18.04 LTS): Bootstrap: docker From: ubuntu:18.04 Or an Nvidia developer image Bootstrap: docker From: nvidia/cuda:9.2-cudnn7-devel-ubuntu18.04 The rest of the sections all begin with % and the section name. You will see section contents indented by convention, but this is not required. %labels The labels section allows you to define metadata for your container: %labels Name Maintainer \"YCRC Support Team\" Version v99.9 Architecture x86_64 URL https://research.computing.yale.edu/ You can examine container metadata with the apptainer inspect command. %files If you'd like to copy any files from the system you are building on, you do so in the %files section. Each line in the files section is a pair of source and destination paths, where the source is on your host system, and destination is a path in the container. %files sample_data.tar /opt/sample_data/ example_script.sh /opt/sample_data/ %post The post section is where you can run updates, installs, etc in your container to customize it. %post echo \"Customizing Ubuntu\" apt-get update apt-get -y install software-properties-common build-essential cmake add-apt-repository universe apt-get update apt-get -y libboost-all-dev libgl1-mesa-dev libglu1-mesa-dev cd /tmp git clone https://github.com/gitdudette/myapp && cd myapp # ... etc etc %environment The environment section allows you to define environment variables for your container. These variables are available when you run the built container, not during its build. %environment export PATH = /opt/my_app/bin: $PATH export LD_LIBRARY_PATH = /opt/my_app/lib: $LD_LIBRARY_PATH Building To finally build your container after saving your definition file as my_app.def , for example, you would run apptainer build my_app.sif my_app.def Use a Container Image Once you have a container image, you can run it as a part of a batch job, or interactively. Interactively To get a shell in a container so you can interactively work in its environment: apptainer shell --shell /bin/bash containername.sif In a Job Script You can also run applications from your container non-interactively as you would in a batch job. If I wanted to run a script called my_script.py using my container's python: apptainer exec containername.sif python my_script.py Environment Variables If you are unsure if you are running inside or outside your container, you can run: echo $APPTAINER_NAME If you get back text, you are in your container. If you'd like to pass environment variables into your container, you can do so by defining them prefixed with APPTAINERENV_ . For Example: export APPTAINERENV_BLASTDB = /home/me/db/blast apptainer exec my_blast_image.sif env | grep BLAST Should return BLASTDB=/home/me/db/blast , which means you set the BLASTDB environment variable in the container properly. Additional Notes MPI MPI support for Apptainer is relatively straight-forward. The only thing to watch is to make sure that you are using the same version of MPI inside your container as you are on the cluster. GPUs You can use GPU-accelerated code inside your container, which will need most everything also installed in your container (e.g. CUDA, cuDNN). In order for your applications to have access to the right drivers on the host machine, use the --nv flag. For example: apptainer exec --nv tensorflow-10.0-py3.sif python ./my-tf-model.py Home Directories Sometimes the maintainer of a Docker container you are trying to use installed software into a special user's home directory. If you need access to someone's home directory that exists in the container and not on the host, you should add the --contain option. Unfortunately, you will also then have to explicitly tell Apptainer about the paths that you want to use from inside the container with the --bind option. apptainer shell --shell /bin/bash --contain --bind /gpfs/gibbs/project/support/be59:/home/be59/project bioconvert-latest.sif","title":"Containers"},{"location":"clusters-at-yale/guides/containers/#containers","text":"Warning The Singularity project has been renamed Apptainer . Everything should still work the same, including the 'singularity' command. If you find it not working as expected, please contact us . Apptainer (formerly Singularity) is a Linux container technology that is well suited to use in shared-user environments such as the clusters we maintain at Yale. It is similar to Docker ; You can bring with you a stack of software, libraries, and a Linux operating system that is independent of the host computer you run the container on. This can be very useful if you want to share your software environment with other researchers or yourself across several computers. Because Apptainer containers run as the user that started them and mount home directories by default, you can usually see the data you're interested in working on that is stored on a host computer without any extra work. Below we will outline some common use cases covering the creation and use of containers. There is also excellent documentation available on the full and official user guide for Apptainer . We are happy to help, just contact us with your questions. Warning On the Yale clusters, Apptainer is not installed on login nodes. You will need to run it from compute nodes.","title":"Containers"},{"location":"clusters-at-yale/guides/containers/#apptainer-containers","text":"Images are the file(s) you use to run your container. Apptainer images are single files that usually end in .sif and are read-only by default, meaning changes you make to the environment inside the container are not persistent.","title":"Apptainer Containers"},{"location":"clusters-at-yale/guides/containers/#use-a-pre-existing-container","text":"If someone has already built a container that suits your needs, you can use it directly. Apptainer images are single files that can be transferred to the clusters. You can fetch images from container registries such as Docker Hub or NVidia Container Registry . Container images can take up a lot of disk space (dozens of gigabytes), so you may want to change the default location Apptainer uses to cache these files. To do this before getting started, you should add something like the example below to to your ~/.bashrc file: # set APPTAINER_CACHEDIR if you want to pull files (which can get big) somewhere other than $HOME/.apptainer # e.g. export APPTAINER_CACHEDIR = ~/scratch60/.apptainer Here are some examples of getting containers already built by someone else with apptainer: # from Docker Hub (https://hub.docker.com/) apptainer build ubuntu-18.10.sif docker://ubuntu:18.10 apptainer build tensorflow-10.0-py3.sif docker://tensorflow/tensorflow:1.10.0-py3 # from Singularity Hub (no longer updated) apptainer build bioconvert-latest.sif shub://biokit/bioconvert:latest","title":"Use a Pre-existing Container"},{"location":"clusters-at-yale/guides/containers/#build-your-own-container","text":"You can define a container image to be exactly how you want/need it to be, including applications, libraries, and files of your choosing with a definition file . Apptainer definition files are similar to Docker's Dockerfile , but use different syntax. For full definition files and more documentation please see the Apptainer site .","title":"Build Your Own Container"},{"location":"clusters-at-yale/guides/containers/#header","text":"Every container definition must begin with a header that defines what image to start with, or bootstrap from. This can be an official Linux distribution or someone else's container that gets you nearly what you want. To start from Ubuntu Bionic Beaver (18.04 LTS): Bootstrap: docker From: ubuntu:18.04 Or an Nvidia developer image Bootstrap: docker From: nvidia/cuda:9.2-cudnn7-devel-ubuntu18.04 The rest of the sections all begin with % and the section name. You will see section contents indented by convention, but this is not required.","title":"Header"},{"location":"clusters-at-yale/guides/containers/#labels","text":"The labels section allows you to define metadata for your container: %labels Name Maintainer \"YCRC Support Team\" Version v99.9 Architecture x86_64 URL https://research.computing.yale.edu/ You can examine container metadata with the apptainer inspect command.","title":"%labels"},{"location":"clusters-at-yale/guides/containers/#files","text":"If you'd like to copy any files from the system you are building on, you do so in the %files section. Each line in the files section is a pair of source and destination paths, where the source is on your host system, and destination is a path in the container. %files sample_data.tar /opt/sample_data/ example_script.sh /opt/sample_data/","title":"%files"},{"location":"clusters-at-yale/guides/containers/#post","text":"The post section is where you can run updates, installs, etc in your container to customize it. %post echo \"Customizing Ubuntu\" apt-get update apt-get -y install software-properties-common build-essential cmake add-apt-repository universe apt-get update apt-get -y libboost-all-dev libgl1-mesa-dev libglu1-mesa-dev cd /tmp git clone https://github.com/gitdudette/myapp && cd myapp # ... etc etc","title":"%post"},{"location":"clusters-at-yale/guides/containers/#environment","text":"The environment section allows you to define environment variables for your container. These variables are available when you run the built container, not during its build. %environment export PATH = /opt/my_app/bin: $PATH export LD_LIBRARY_PATH = /opt/my_app/lib: $LD_LIBRARY_PATH","title":"%environment"},{"location":"clusters-at-yale/guides/containers/#building","text":"To finally build your container after saving your definition file as my_app.def , for example, you would run apptainer build my_app.sif my_app.def","title":"Building"},{"location":"clusters-at-yale/guides/containers/#use-a-container-image","text":"Once you have a container image, you can run it as a part of a batch job, or interactively.","title":"Use a Container Image"},{"location":"clusters-at-yale/guides/containers/#interactively","text":"To get a shell in a container so you can interactively work in its environment: apptainer shell --shell /bin/bash containername.sif","title":"Interactively"},{"location":"clusters-at-yale/guides/containers/#in-a-job-script","text":"You can also run applications from your container non-interactively as you would in a batch job. If I wanted to run a script called my_script.py using my container's python: apptainer exec containername.sif python my_script.py","title":"In a Job Script"},{"location":"clusters-at-yale/guides/containers/#environment-variables","text":"If you are unsure if you are running inside or outside your container, you can run: echo $APPTAINER_NAME If you get back text, you are in your container. If you'd like to pass environment variables into your container, you can do so by defining them prefixed with APPTAINERENV_ . For Example: export APPTAINERENV_BLASTDB = /home/me/db/blast apptainer exec my_blast_image.sif env | grep BLAST Should return BLASTDB=/home/me/db/blast , which means you set the BLASTDB environment variable in the container properly.","title":"Environment Variables"},{"location":"clusters-at-yale/guides/containers/#additional-notes","text":"","title":"Additional Notes"},{"location":"clusters-at-yale/guides/containers/#mpi","text":"MPI support for Apptainer is relatively straight-forward. The only thing to watch is to make sure that you are using the same version of MPI inside your container as you are on the cluster.","title":"MPI"},{"location":"clusters-at-yale/guides/containers/#gpus","text":"You can use GPU-accelerated code inside your container, which will need most everything also installed in your container (e.g. CUDA, cuDNN). In order for your applications to have access to the right drivers on the host machine, use the --nv flag. For example: apptainer exec --nv tensorflow-10.0-py3.sif python ./my-tf-model.py","title":"GPUs"},{"location":"clusters-at-yale/guides/containers/#home-directories","text":"Sometimes the maintainer of a Docker container you are trying to use installed software into a special user's home directory. If you need access to someone's home directory that exists in the container and not on the host, you should add the --contain option. Unfortunately, you will also then have to explicitly tell Apptainer about the paths that you want to use from inside the container with the --bind option. apptainer shell --shell /bin/bash --contain --bind /gpfs/gibbs/project/support/be59:/home/be59/project bioconvert-latest.sif","title":"Home Directories"},{"location":"clusters-at-yale/guides/cryoem/","text":"Cryogenic Electron Microscopy (Cryo-EM) Data Processing on McCleary Below is a work in progress collection of general hints, tips and tricks for running your work on McCleary . As always, if anything below is unclear or could use updating, please let us know during office hours, via email or through our web ticketing system . Storage Be wary of you and your group's storage quotas. Run getquota from time to time to make sure there isn't usage you aren't expecting. We strongly recommend that you archive raw data off-cluster, as only home directories are backed up . Let us know if you need extra space and we can work with you to find a solution that is right for your project and your group. On most GPU nodes there is a fast SSD mounted at /tmp . You can use this as a fast local cache if your program can take advantage of it. Schedule Jobs Many Cryo-EM applications can make use of GPUs as co-processors. In order to use a GPU on McCleary you must allocate a job on a partition with GPUs available and explicitly request GPU(s). Make sure to familiarize yourself with our documentation on scheduling jobs and requesting specific resources . In addition to public partitions that give you access to GPUs, there are pi_cryoem and pi_tomo partitions which are limited to users of the Cryo-EM resources on campus. Please coordinate with the staff from West Campus and CCMI ( See here for contact info ) for access. Software Many Cryo-EM applications are meant to be viewed and interacted with in real-time. This mode of working is not ideal for the way most HPC clusters are set up, so where possible try to prototype a job you would like to run with a smaller dataset or subset of your data. Then develop a script to submit with sbatch . RELION The RELION pipeline operates in two modes. You can use it as a more familiar and beginner-friendly graphical interface, or call the programs involved directly. Once you are comfortable, using the commands directly in scripts submitted with sbatch will allow you to get the most work done the fastest. The authors provide up-to-date hints about performance on their Benchmarks page. If you need technical help (jobs submit fine but having other issues) you should search and submit to their mailing list . Module We have GPU-enabled versions of RELION available on McCleary as software modules . To check witch versions are available, run module avail relion . To see specific notes about a particular install, you can use module help , e.g. module help RELION/4.0.0-fosscuda-2020b . Example Job Parameters RELION reserves one worker (slurm task) for orchestrating an MPI-based job, which they call the \"master\". This can lead to inefficient jobs where there are tasks that could be using a GPU but are stuck being the master process. You can request a better layout for your job with a heterogenous job , allocating CPUs on a cpu-only compute node for the task that will not use GPUs. Here is an example 3D refinement job submission script (replace choose_a_version with the version you want to use): #!/bin/bash #SBATCH --partition=general --ntasks 1 -c2 --job-name=class3D_hetero_01 --mem=10G --output=\"class3D_hetero_01-%j.out\" #SBATCH hetjob #SBATCH --partition=gpu --ntasks 4 -c2 -N1 --mem-per-cpu=16G --gpus-per-task=1 module load RELION/choose_a_version srun --pack-group = 0 ,1 relion_refine_mpi --o hetero/refine3D/job0001 ... --dont_combine_weights_via_disc --j ${ SLURM_CPUS_PER_TASK } --gpu This job submission request will result in RELION using a single task/worker on a general purpose CPU node, and efficiently find four GPUs even if they aren't all available on the same compute node. Each GPU node task/worker will have a dedicated GPU, two CPU cores, and 30GiB total memory. EMAN2 EMAN2 has always been a bit of a struggle for us to install properly on the clusters. Below are a few options Conda Install The EMAN2 authors offer some instructions on how to get EMAN2 running in a cluster environment on their install page . The default install may work as well if you avoid using MPI. Container At present, we have a mostly working apptainer container for EMAN2.3 available here: /gpfs/ysm/datasets/cryoem/eman2.3_ubuntu18.04.sif To run a program from EMAN2 using this container you would use a command like: apptainer exec /gpfs/ysm/datasets/cryoem/eman2.3_ubuntu18.04.sif e2projectmanager.py Cryosparc We have a whole separate page about this one, it is a bit involved. Other Software We have CCP4, Phenix and some other software modules of interest installed. Run module avail and the software name to search for them. If you can't find one you need, please contact us .","title":"Cryo-EM on McCleary"},{"location":"clusters-at-yale/guides/cryoem/#cryogenic-electron-microscopy-cryo-em-data-processing-on-mccleary","text":"Below is a work in progress collection of general hints, tips and tricks for running your work on McCleary . As always, if anything below is unclear or could use updating, please let us know during office hours, via email or through our web ticketing system .","title":"Cryogenic Electron Microscopy (Cryo-EM) Data Processing on McCleary"},{"location":"clusters-at-yale/guides/cryoem/#storage","text":"Be wary of you and your group's storage quotas. Run getquota from time to time to make sure there isn't usage you aren't expecting. We strongly recommend that you archive raw data off-cluster, as only home directories are backed up . Let us know if you need extra space and we can work with you to find a solution that is right for your project and your group. On most GPU nodes there is a fast SSD mounted at /tmp . You can use this as a fast local cache if your program can take advantage of it.","title":"Storage"},{"location":"clusters-at-yale/guides/cryoem/#schedule-jobs","text":"Many Cryo-EM applications can make use of GPUs as co-processors. In order to use a GPU on McCleary you must allocate a job on a partition with GPUs available and explicitly request GPU(s). Make sure to familiarize yourself with our documentation on scheduling jobs and requesting specific resources . In addition to public partitions that give you access to GPUs, there are pi_cryoem and pi_tomo partitions which are limited to users of the Cryo-EM resources on campus. Please coordinate with the staff from West Campus and CCMI ( See here for contact info ) for access.","title":"Schedule Jobs"},{"location":"clusters-at-yale/guides/cryoem/#software","text":"Many Cryo-EM applications are meant to be viewed and interacted with in real-time. This mode of working is not ideal for the way most HPC clusters are set up, so where possible try to prototype a job you would like to run with a smaller dataset or subset of your data. Then develop a script to submit with sbatch .","title":"Software"},{"location":"clusters-at-yale/guides/cryoem/#relion","text":"The RELION pipeline operates in two modes. You can use it as a more familiar and beginner-friendly graphical interface, or call the programs involved directly. Once you are comfortable, using the commands directly in scripts submitted with sbatch will allow you to get the most work done the fastest. The authors provide up-to-date hints about performance on their Benchmarks page. If you need technical help (jobs submit fine but having other issues) you should search and submit to their mailing list .","title":"RELION"},{"location":"clusters-at-yale/guides/cryoem/#module","text":"We have GPU-enabled versions of RELION available on McCleary as software modules . To check witch versions are available, run module avail relion . To see specific notes about a particular install, you can use module help , e.g. module help RELION/4.0.0-fosscuda-2020b .","title":"Module"},{"location":"clusters-at-yale/guides/cryoem/#example-job-parameters","text":"RELION reserves one worker (slurm task) for orchestrating an MPI-based job, which they call the \"master\". This can lead to inefficient jobs where there are tasks that could be using a GPU but are stuck being the master process. You can request a better layout for your job with a heterogenous job , allocating CPUs on a cpu-only compute node for the task that will not use GPUs. Here is an example 3D refinement job submission script (replace choose_a_version with the version you want to use): #!/bin/bash #SBATCH --partition=general --ntasks 1 -c2 --job-name=class3D_hetero_01 --mem=10G --output=\"class3D_hetero_01-%j.out\" #SBATCH hetjob #SBATCH --partition=gpu --ntasks 4 -c2 -N1 --mem-per-cpu=16G --gpus-per-task=1 module load RELION/choose_a_version srun --pack-group = 0 ,1 relion_refine_mpi --o hetero/refine3D/job0001 ... --dont_combine_weights_via_disc --j ${ SLURM_CPUS_PER_TASK } --gpu This job submission request will result in RELION using a single task/worker on a general purpose CPU node, and efficiently find four GPUs even if they aren't all available on the same compute node. Each GPU node task/worker will have a dedicated GPU, two CPU cores, and 30GiB total memory.","title":"Example Job Parameters"},{"location":"clusters-at-yale/guides/cryoem/#eman2","text":"EMAN2 has always been a bit of a struggle for us to install properly on the clusters. Below are a few options","title":"EMAN2"},{"location":"clusters-at-yale/guides/cryoem/#conda-install","text":"The EMAN2 authors offer some instructions on how to get EMAN2 running in a cluster environment on their install page . The default install may work as well if you avoid using MPI.","title":"Conda Install"},{"location":"clusters-at-yale/guides/cryoem/#container","text":"At present, we have a mostly working apptainer container for EMAN2.3 available here: /gpfs/ysm/datasets/cryoem/eman2.3_ubuntu18.04.sif To run a program from EMAN2 using this container you would use a command like: apptainer exec /gpfs/ysm/datasets/cryoem/eman2.3_ubuntu18.04.sif e2projectmanager.py","title":"Container"},{"location":"clusters-at-yale/guides/cryoem/#cryosparc","text":"We have a whole separate page about this one, it is a bit involved.","title":"Cryosparc"},{"location":"clusters-at-yale/guides/cryoem/#other-software","text":"We have CCP4, Phenix and some other software modules of interest installed. Run module avail and the software name to search for them. If you can't find one you need, please contact us .","title":"Other Software"},{"location":"clusters-at-yale/guides/cryosparc/","text":"cryoSPARCv2 on Farnam Getting cryoSPARC set up and running on the YCRC clusters is something of a task. This guide is meant for intermediate/advanced users. If enought people can convince Structura bio ( see ticket here ) to make cryoSPARC more cluster-friendly we could have a single instance running that you'd just log in to with your Yale credentials. Until then, venture below at your own peril. Install Before you get started, you will need to request a licence from Structura from their website . These instructions are gently modified from the official cryoSPARC documentation . 1. Set up Environment First allocate an interactive job on a compute node to run the install on. salloc --cpus-per-task 2 Then, set the following environment variables to suit your install. We filled in some defaults for you. # where to install cryosparc2 and its sample database install_path = $( readlink -f ${ HOME } /project ) /software/cryosparc2 # the license ID you got from Structura license_id = # your email my_email = $( head -n1 ~/.forward ) # slurm partition to submit your cryosparc jobs to # not sure you can change at runtime? partition = gpu 2. Set up Directories, Download installers # your username my_name = ${ USER } # a temp password cryosparc_passwd = Password123 # load the right CUDA module load CUDA/9.0.176 # set up some more paths db_path = ${ install_path } /database worker_path = ${ install_path } /cryosparc2_worker ssd_path = /tmp/ ${ USER } /cryosparc2_cache # go get the installers mkdir -p $install_path cd $install_path curl -sL https://get.cryosparc.com/download/master-latest/ $license_id > cryosparc2_master.tar.gz curl -sL https://get.cryosparc.com/download/worker-latest/ $license_id > cryosparc2_worker.tar.gz tar -xf cryosparc2_master.tar.gz tar -xf cryosparc2_worker.tar.gz 3. Install the Server and Worker cd ${ install_path } /cryosparc2_master ./install.sh --license $license_id --hostname $( hostname ) --dbpath $db_path --yes source ~/.bashrc cd ${ install_path } /cryosparc2_worker ./install.sh --license $license_id --cudapath $CUDA_HOME --yes source ~/.bashrc 4. Configure for Farnam # Farnam cluster setup mkdir -p ${ install_path } /site_configs && cd ${ install_path } /site_configs cat << EOF > cluster_info.json { \"name\" : \"farnam\", \"worker_bin_path\" : \"${install_path}/cryosparc2_worker/bin/cryosparcw\", \"cache_path\" : \"/tmp/{{ cryosparc_username }}/cryosparc_cache\", \"send_cmd_tpl\" : \"{{ command }}\", \"qsub_cmd_tpl\" : \"sbatch {{ script_path_abs }}\", \"qstat_cmd_tpl\" : \"squeue -j {{ cluster_job_id }}\", \"qdel_cmd_tpl\" : \"scancel {{ cluster_job_id }}\", \"qinfo_cmd_tpl\" : \"sinfo\" } EOF cat << EOF > cluster_script.sh #!/usr/bin/env bash #SBATCH --job-name cryosparc_{{ project_uid }}_{{ job_uid }} #SBATCH -c {{ num_cpu }} #SBATCH --gpus={{ num_gpu }} #SBATCH -p ${partition} #SBATCH --mem={{ (ram_gb*1024)|int }} #SBATCH -o {{ job_dir_abs }} #SBATCH -e {{ job_dir_abs }} module load CUDA/9.0.176 mkdir -p /tmp/${USER}/cryosparc2_cache {{ run_cmd }} EOF Run salloc --cpus-per-task 2 master_host = $( hostname ) base_dir = $( dirname \" $( dirname \" $( which cryosparcm ) \" ) \" ) sed -i.bak 's/export CRYOSPARC_MASTER_HOSTNAME.*$/export CRYOSPARC_MASTER_HOSTNAME=\\\"' \" $master_host \" '\\\"/g' $base_dir /config.sh source $base_dir /config.sh cryosparcm start cryosparcm status # run the output from the following command on your local linux/mac machine echo \"ssh -N -L $CRYOSPARC_BASE_PORT : $master_host : $CRYOSPARC_BASE_PORT $USER @mccleary.ycrc.yale.edu\" Database errors If your database won't start and you're sure there isn't another server running, you can remove lock files and try again. # rm -f $CRYOSPARC_DB_PATH/WiredTiger.lock $CRYOSPARC_DB_PATH/mongod.lock","title":"cryoSPARCv2 on Farnam"},{"location":"clusters-at-yale/guides/cryosparc/#cryosparcv2-on-farnam","text":"Getting cryoSPARC set up and running on the YCRC clusters is something of a task. This guide is meant for intermediate/advanced users. If enought people can convince Structura bio ( see ticket here ) to make cryoSPARC more cluster-friendly we could have a single instance running that you'd just log in to with your Yale credentials. Until then, venture below at your own peril.","title":"cryoSPARCv2 on Farnam"},{"location":"clusters-at-yale/guides/cryosparc/#install","text":"Before you get started, you will need to request a licence from Structura from their website . These instructions are gently modified from the official cryoSPARC documentation .","title":"Install"},{"location":"clusters-at-yale/guides/cryosparc/#1-set-up-environment","text":"First allocate an interactive job on a compute node to run the install on. salloc --cpus-per-task 2 Then, set the following environment variables to suit your install. We filled in some defaults for you. # where to install cryosparc2 and its sample database install_path = $( readlink -f ${ HOME } /project ) /software/cryosparc2 # the license ID you got from Structura license_id = # your email my_email = $( head -n1 ~/.forward ) # slurm partition to submit your cryosparc jobs to # not sure you can change at runtime? partition = gpu","title":"1. Set up Environment"},{"location":"clusters-at-yale/guides/cryosparc/#2-set-up-directories-download-installers","text":"# your username my_name = ${ USER } # a temp password cryosparc_passwd = Password123 # load the right CUDA module load CUDA/9.0.176 # set up some more paths db_path = ${ install_path } /database worker_path = ${ install_path } /cryosparc2_worker ssd_path = /tmp/ ${ USER } /cryosparc2_cache # go get the installers mkdir -p $install_path cd $install_path curl -sL https://get.cryosparc.com/download/master-latest/ $license_id > cryosparc2_master.tar.gz curl -sL https://get.cryosparc.com/download/worker-latest/ $license_id > cryosparc2_worker.tar.gz tar -xf cryosparc2_master.tar.gz tar -xf cryosparc2_worker.tar.gz","title":"2. Set up Directories, Download installers"},{"location":"clusters-at-yale/guides/cryosparc/#3-install-the-server-and-worker","text":"cd ${ install_path } /cryosparc2_master ./install.sh --license $license_id --hostname $( hostname ) --dbpath $db_path --yes source ~/.bashrc cd ${ install_path } /cryosparc2_worker ./install.sh --license $license_id --cudapath $CUDA_HOME --yes source ~/.bashrc","title":"3. Install the Server and Worker"},{"location":"clusters-at-yale/guides/cryosparc/#4-configure-for-farnam","text":"# Farnam cluster setup mkdir -p ${ install_path } /site_configs && cd ${ install_path } /site_configs cat << EOF > cluster_info.json { \"name\" : \"farnam\", \"worker_bin_path\" : \"${install_path}/cryosparc2_worker/bin/cryosparcw\", \"cache_path\" : \"/tmp/{{ cryosparc_username }}/cryosparc_cache\", \"send_cmd_tpl\" : \"{{ command }}\", \"qsub_cmd_tpl\" : \"sbatch {{ script_path_abs }}\", \"qstat_cmd_tpl\" : \"squeue -j {{ cluster_job_id }}\", \"qdel_cmd_tpl\" : \"scancel {{ cluster_job_id }}\", \"qinfo_cmd_tpl\" : \"sinfo\" } EOF cat << EOF > cluster_script.sh #!/usr/bin/env bash #SBATCH --job-name cryosparc_{{ project_uid }}_{{ job_uid }} #SBATCH -c {{ num_cpu }} #SBATCH --gpus={{ num_gpu }} #SBATCH -p ${partition} #SBATCH --mem={{ (ram_gb*1024)|int }} #SBATCH -o {{ job_dir_abs }} #SBATCH -e {{ job_dir_abs }} module load CUDA/9.0.176 mkdir -p /tmp/${USER}/cryosparc2_cache {{ run_cmd }} EOF","title":"4. Configure for Farnam"},{"location":"clusters-at-yale/guides/cryosparc/#run","text":"salloc --cpus-per-task 2 master_host = $( hostname ) base_dir = $( dirname \" $( dirname \" $( which cryosparcm ) \" ) \" ) sed -i.bak 's/export CRYOSPARC_MASTER_HOSTNAME.*$/export CRYOSPARC_MASTER_HOSTNAME=\\\"' \" $master_host \" '\\\"/g' $base_dir /config.sh source $base_dir /config.sh cryosparcm start cryosparcm status # run the output from the following command on your local linux/mac machine echo \"ssh -N -L $CRYOSPARC_BASE_PORT : $master_host : $CRYOSPARC_BASE_PORT $USER @mccleary.ycrc.yale.edu\"","title":"Run"},{"location":"clusters-at-yale/guides/cryosparc/#database-errors","text":"If your database won't start and you're sure there isn't another server running, you can remove lock files and try again. # rm -f $CRYOSPARC_DB_PATH/WiredTiger.lock $CRYOSPARC_DB_PATH/mongod.lock","title":"Database errors"},{"location":"clusters-at-yale/guides/gaussian/","text":"Gaussian Note Access to Gaussian on the Yale clusters is free, but available by request only. To gain access to the installations of Gaussian, please contact us to be added to the gaussian group. Gaussian is an electronic structure modeling program that Yale has licensed for its HPC clusters. The latest version of Gaussian is Gaussian 16, which also includes GaussView 6. Older versions of both applications are also available. To see a full list of available versions of Gaussian on the cluster, run: module avail gaussian Running Gaussian on the Cluster The examples here are for Gaussian 16. In most cases, you could run the older version Gaussian 09 by replacing \"g16\" with \"g09\" wherever it occurs. When running Gaussian, it is recommended that users request exclusive access to allocated nodes (e.g., by requesting all the cpus on the node) and that they specify the largest possible memory allocation for the number of nodes requested. In addition, in most cases, the scratch storage location (set by the environment variable GAUSS_SCRDIR ) should be on the local parallel scratch file system (e.g., scratch60) of the cluster, rather than in the user\u2019s home directory. (This is the default in the Gaussian module files.) Before running Gaussian, you must set up a number of environment variables. This is accomplished most easily by loading the Gaussian module file using: module load Gaussian To run Gaussian interactively, you need to create an interactive session on a compute node. You could start an interactive session using 4 cores for 4 hours using salloc -c 4 -t 4 :00:00 See our Slurm documentation for more detailed information on requesting resources for interactive jobs. GaussView In connection with Gaussian 16, we have also installed GaussView 6, Gaussian Inc.'s most advanced and powerful graphical interface for Gaussian. With GaussView, you can import or build the molecular structures that interest you; set up, launch, monitor and control Gaussian calculations; and retrieve and view the results, all without ever leaving the application. GaussView 6 includes many new features designed to make working with large systems of chemical interest convenient and straightforward. It also provides full support for all of the new modeling methods and features in Gaussian 16. In order to use GaussView, you must run an X Server on your desktop or laptop, and you must enable X forwarding when logging into the cluster. See our X11 forwarding documentation for instructions. Loading the module file for Gaussian sets up your environment for GaussView as well. Then you can start GaussView by typing the command gv . GaussView 6 may not be compatible with certain versions of the X servers you may run on your desktop or laptop. If you encounter problems, these can often be overcome by starting GaussView with the command gv -mesagl or gv -soft .","title":"Gaussian"},{"location":"clusters-at-yale/guides/gaussian/#gaussian","text":"Note Access to Gaussian on the Yale clusters is free, but available by request only. To gain access to the installations of Gaussian, please contact us to be added to the gaussian group. Gaussian is an electronic structure modeling program that Yale has licensed for its HPC clusters. The latest version of Gaussian is Gaussian 16, which also includes GaussView 6. Older versions of both applications are also available. To see a full list of available versions of Gaussian on the cluster, run: module avail gaussian","title":"Gaussian"},{"location":"clusters-at-yale/guides/gaussian/#running-gaussian-on-the-cluster","text":"The examples here are for Gaussian 16. In most cases, you could run the older version Gaussian 09 by replacing \"g16\" with \"g09\" wherever it occurs. When running Gaussian, it is recommended that users request exclusive access to allocated nodes (e.g., by requesting all the cpus on the node) and that they specify the largest possible memory allocation for the number of nodes requested. In addition, in most cases, the scratch storage location (set by the environment variable GAUSS_SCRDIR ) should be on the local parallel scratch file system (e.g., scratch60) of the cluster, rather than in the user\u2019s home directory. (This is the default in the Gaussian module files.) Before running Gaussian, you must set up a number of environment variables. This is accomplished most easily by loading the Gaussian module file using: module load Gaussian To run Gaussian interactively, you need to create an interactive session on a compute node. You could start an interactive session using 4 cores for 4 hours using salloc -c 4 -t 4 :00:00 See our Slurm documentation for more detailed information on requesting resources for interactive jobs.","title":"Running Gaussian on the Cluster"},{"location":"clusters-at-yale/guides/gaussian/#gaussview","text":"In connection with Gaussian 16, we have also installed GaussView 6, Gaussian Inc.'s most advanced and powerful graphical interface for Gaussian. With GaussView, you can import or build the molecular structures that interest you; set up, launch, monitor and control Gaussian calculations; and retrieve and view the results, all without ever leaving the application. GaussView 6 includes many new features designed to make working with large systems of chemical interest convenient and straightforward. It also provides full support for all of the new modeling methods and features in Gaussian 16. In order to use GaussView, you must run an X Server on your desktop or laptop, and you must enable X forwarding when logging into the cluster. See our X11 forwarding documentation for instructions. Loading the module file for Gaussian sets up your environment for GaussView as well. Then you can start GaussView by typing the command gv . GaussView 6 may not be compatible with certain versions of the X servers you may run on your desktop or laptop. If you encounter problems, these can often be overcome by starting GaussView with the command gv -mesagl or gv -soft .","title":"GaussView"},{"location":"clusters-at-yale/guides/github/","text":"Version control with Git and GitHub What is version control? Version contol is an easy and powerful way to track changes to your work. This extends from code to writing documents (if using LaTeX/Tex). It produces and saves \"tagged\" copies of your project so that you don't need to worry about breaking your code-base. This provides a \"coding safety net\" to let you try new features while retaining the ability to roll-back to a working version. Whether developing large frameworks or simply working on small scripts, version control is an important tool to ensure that your work is never lost. We recommend using git for its flexibility and versatility and GitHub for its power in enabling research and collaboration. 1 Here we will cover the basics of version control and how to use git and GitHub. What is git and how does it work? Git is a tool that tracks changes to a file (or set of files) through a series of snapshots called \"commits\" or \"revisions\". These snapshots are stored in \"repositories\" which contain the history of all the changes to that file. This helps prevent repetative naming or project_final_final2_v3.txt problems. It acts as a record of all the edits, along with the ability to compare the current version to previous commits. How to create a git repository You can create a repository at any time by running the following commands: cd /path/to/your/project # initialize the repository git init # add files to be tracked git add main.py input.txt # commit the files to the repository, creating the first snapshot git commit -m \"Initial Commit\" This sets up a repository containing a single snapshot of the project's two files. We can then edit these files and commit the changes into a new snapshot: # edit files echo \"changed this file\" >> input.txt $ git status On branch main Changes not staged for commit: ( use \"git add ...\" to update what will be committed ) ( use \"git checkout -- ...\" to discard changes in working directory ) modified: input.txt no changes added to commit ( use \"git add\" and/or \"git commit -a\" ) Finally, we can stage input.txt and then commit the changes: # stage changes for commit git add input.txt git commit -m \"modified input file\" Configuring git It's very helpful to configure your email and username with git : git config --global user.name \"Your Name\" git config --global user.email \"your.email@yale.edu\" This will then tag your changes with your name and email when collaborating with people on a larger project. Working with remote repositories on GitHub We recommend using an off-site repository like GitHub that provides a secure and co-located backup of your local repositories. To start, create a repository on GitHub by going to https://github.com/new and providing a name and choose either public or private access. Then you can connect your local repository to the GitHub repo (named my_new_repo ): git remote add origin git@github.com:user_name/my_new_repo.git git push -u origin main Alternatively, a repository can be created on GitHub and then cloned to your local machine: $ git clone git@github.com:user_name/my_new_repo.git Cloning into 'my_new_repo' ... remote: Enumerating objects: 3 , done . remote: Counting objects: 100 % ( 3 /3 ) , done . remote: Total 3 ( delta 0 ) , reused 0 ( delta 0 ) , pack-reused 0 Receiving objects: 100 % ( 3 /3 ) , done . This creates a new directory ( my_new_repo ) where you can place all your code. After making any changes and commiting them to the local repository, you can \"push\" them to a remote repository: # commit to local repository git commit -m \"new changes\" # push commits to remote repository on GitHub git push Educational GitHub All students and research staff are able to request free Educational discounts from GitHub. This provides a \"Pro\" account for free, including unlimited private repositories. To get started, create a free GitHub account with your Yale email address. Then go to https://education.github.com and request the educational discount. It normally takes less than 24 hours for them to grant the discount. Educational discounts are also available for teams and collaborations. This is perfect for a research group or collaboration and can include non-Yale affiliated people. Resources and links YCRC Version Control Bootcamp Educational GitHub GitHub's Try-it Instruqt Getting Started With Git We do not recommend the use of https://git.yale.edu , which is an internal-only tool not designed for research use. \u21a9","title":"GitHub"},{"location":"clusters-at-yale/guides/github/#version-control-with-git-and-github","text":"","title":"Version control with Git and GitHub"},{"location":"clusters-at-yale/guides/github/#what-is-version-control","text":"Version contol is an easy and powerful way to track changes to your work. This extends from code to writing documents (if using LaTeX/Tex). It produces and saves \"tagged\" copies of your project so that you don't need to worry about breaking your code-base. This provides a \"coding safety net\" to let you try new features while retaining the ability to roll-back to a working version. Whether developing large frameworks or simply working on small scripts, version control is an important tool to ensure that your work is never lost. We recommend using git for its flexibility and versatility and GitHub for its power in enabling research and collaboration. 1 Here we will cover the basics of version control and how to use git and GitHub.","title":"What is version control?"},{"location":"clusters-at-yale/guides/github/#what-is-git-and-how-does-it-work","text":"Git is a tool that tracks changes to a file (or set of files) through a series of snapshots called \"commits\" or \"revisions\". These snapshots are stored in \"repositories\" which contain the history of all the changes to that file. This helps prevent repetative naming or project_final_final2_v3.txt problems. It acts as a record of all the edits, along with the ability to compare the current version to previous commits.","title":"What is git and how does it work?"},{"location":"clusters-at-yale/guides/github/#how-to-create-a-git-repository","text":"You can create a repository at any time by running the following commands: cd /path/to/your/project # initialize the repository git init # add files to be tracked git add main.py input.txt # commit the files to the repository, creating the first snapshot git commit -m \"Initial Commit\" This sets up a repository containing a single snapshot of the project's two files. We can then edit these files and commit the changes into a new snapshot: # edit files echo \"changed this file\" >> input.txt $ git status On branch main Changes not staged for commit: ( use \"git add ...\" to update what will be committed ) ( use \"git checkout -- ...\" to discard changes in working directory ) modified: input.txt no changes added to commit ( use \"git add\" and/or \"git commit -a\" ) Finally, we can stage input.txt and then commit the changes: # stage changes for commit git add input.txt git commit -m \"modified input file\"","title":"How to create a git repository"},{"location":"clusters-at-yale/guides/github/#configuring-git","text":"It's very helpful to configure your email and username with git : git config --global user.name \"Your Name\" git config --global user.email \"your.email@yale.edu\" This will then tag your changes with your name and email when collaborating with people on a larger project.","title":"Configuring git"},{"location":"clusters-at-yale/guides/github/#working-with-remote-repositories-on-github","text":"We recommend using an off-site repository like GitHub that provides a secure and co-located backup of your local repositories. To start, create a repository on GitHub by going to https://github.com/new and providing a name and choose either public or private access. Then you can connect your local repository to the GitHub repo (named my_new_repo ): git remote add origin git@github.com:user_name/my_new_repo.git git push -u origin main Alternatively, a repository can be created on GitHub and then cloned to your local machine: $ git clone git@github.com:user_name/my_new_repo.git Cloning into 'my_new_repo' ... remote: Enumerating objects: 3 , done . remote: Counting objects: 100 % ( 3 /3 ) , done . remote: Total 3 ( delta 0 ) , reused 0 ( delta 0 ) , pack-reused 0 Receiving objects: 100 % ( 3 /3 ) , done . This creates a new directory ( my_new_repo ) where you can place all your code. After making any changes and commiting them to the local repository, you can \"push\" them to a remote repository: # commit to local repository git commit -m \"new changes\" # push commits to remote repository on GitHub git push","title":"Working with remote repositories on GitHub"},{"location":"clusters-at-yale/guides/github/#educational-github","text":"All students and research staff are able to request free Educational discounts from GitHub. This provides a \"Pro\" account for free, including unlimited private repositories. To get started, create a free GitHub account with your Yale email address. Then go to https://education.github.com and request the educational discount. It normally takes less than 24 hours for them to grant the discount. Educational discounts are also available for teams and collaborations. This is perfect for a research group or collaboration and can include non-Yale affiliated people.","title":"Educational GitHub"},{"location":"clusters-at-yale/guides/github/#resources-and-links","text":"YCRC Version Control Bootcamp Educational GitHub GitHub's Try-it Instruqt Getting Started With Git We do not recommend the use of https://git.yale.edu , which is an internal-only tool not designed for research use. \u21a9","title":"Resources and links"},{"location":"clusters-at-yale/guides/github_pages/","text":"GitHub Pages Personal Website A personal website is a great way to build an online presence for both academic and professional activities. We recommend using GitHub Pages as a tool to maintain and host static websites and blogs. Unlike other hosting platforms, the whole website can be written using Markdown , a simple widely-used markup language. GitHub provides a tutorial to get started with Markdown ( link ). To get started, you're going to need a GitHub account. You can follow the instructions on our GitHub guide to set up a free account. Once you have an account, you will need to create a repository for your website. It's important that you name your repository username.github.io where username is replaced with your actual account name ( ycrc-test in this example). Make sure to initialize the repo with a README, which will help get things started. After clicking \"Create\" your repository will look like this: From here, you can click on \"Settings\" to enable GitHub Pages publication of your site. Scroll down until you see GitHub Pages : GitHub provides a number of templates to help make your website look professional. Click on \"Choose a Theme\" to see examples of these themes: Pick one that you like and click \"Select theme\". Note, some of these themes are aimed at blogs versus project sites, pick one that best fits your desired style. You can change this later, so feel free to try one out and see what you think. After selecting your theme, you will be directed back to your repository where the README.md has been updated with some basics about how Markdown works and how you can start creating your website. Scroll down and commit these changes (leaving the sample text in place). You can now take a look at how GitHub is rendering your site: That's it, this site is now hosted at ycrc-test.github.io ! You now have a simple-to-edit and customize site that can be used to host your CV, detail your academic research, or showcase your independent projects. Project website In addition to hosting a stand-alone website, GitHub Pages can be used to create pages for specific projects or repositories. Here we will take an existing repository amazing-python-project and add a GitHub Pages website on a new branch. Click on the Branch pull-down and create a new branch titled gh-pages : Remove any files from that branch and create a new file called index.md : Add content to the page using Markdown syntax: To customize the site, click on Settings and then scroll down to GitHub Pages : Click on the Theme Chooser and select your favorite style: Finally, you can navigate to your website and see it live! Conclusions We have detailed two ways to add static websites to your work, either as a professional webpage or a project-specific site. This can help increase your works impact and give you a platform to showcase your work. Further Reading Jekyll : the tool that powers GitHub Pages GitHub Learning Lab Academic Pages : forkable template for academic websites Jekyll Academic Example GitHub Pages Websites GitHub and Government , https://github.com/github/government.github.com ElectronJS , https://github.com/electron/electronjs.org Twitter GitHub , https://github.com/twitter/twitter.github.io React , https://github.com/facebook/react","title":"GitHub Pages"},{"location":"clusters-at-yale/guides/github_pages/#github-pages","text":"","title":"GitHub Pages"},{"location":"clusters-at-yale/guides/github_pages/#personal-website","text":"A personal website is a great way to build an online presence for both academic and professional activities. We recommend using GitHub Pages as a tool to maintain and host static websites and blogs. Unlike other hosting platforms, the whole website can be written using Markdown , a simple widely-used markup language. GitHub provides a tutorial to get started with Markdown ( link ). To get started, you're going to need a GitHub account. You can follow the instructions on our GitHub guide to set up a free account. Once you have an account, you will need to create a repository for your website. It's important that you name your repository username.github.io where username is replaced with your actual account name ( ycrc-test in this example). Make sure to initialize the repo with a README, which will help get things started. After clicking \"Create\" your repository will look like this: From here, you can click on \"Settings\" to enable GitHub Pages publication of your site. Scroll down until you see GitHub Pages : GitHub provides a number of templates to help make your website look professional. Click on \"Choose a Theme\" to see examples of these themes: Pick one that you like and click \"Select theme\". Note, some of these themes are aimed at blogs versus project sites, pick one that best fits your desired style. You can change this later, so feel free to try one out and see what you think. After selecting your theme, you will be directed back to your repository where the README.md has been updated with some basics about how Markdown works and how you can start creating your website. Scroll down and commit these changes (leaving the sample text in place). You can now take a look at how GitHub is rendering your site: That's it, this site is now hosted at ycrc-test.github.io ! You now have a simple-to-edit and customize site that can be used to host your CV, detail your academic research, or showcase your independent projects.","title":"Personal Website"},{"location":"clusters-at-yale/guides/github_pages/#project-website","text":"In addition to hosting a stand-alone website, GitHub Pages can be used to create pages for specific projects or repositories. Here we will take an existing repository amazing-python-project and add a GitHub Pages website on a new branch. Click on the Branch pull-down and create a new branch titled gh-pages : Remove any files from that branch and create a new file called index.md : Add content to the page using Markdown syntax: To customize the site, click on Settings and then scroll down to GitHub Pages : Click on the Theme Chooser and select your favorite style: Finally, you can navigate to your website and see it live!","title":"Project website"},{"location":"clusters-at-yale/guides/github_pages/#conclusions","text":"We have detailed two ways to add static websites to your work, either as a professional webpage or a project-specific site. This can help increase your works impact and give you a platform to showcase your work.","title":"Conclusions"},{"location":"clusters-at-yale/guides/github_pages/#further-reading","text":"Jekyll : the tool that powers GitHub Pages GitHub Learning Lab Academic Pages : forkable template for academic websites Jekyll Academic","title":"Further Reading"},{"location":"clusters-at-yale/guides/github_pages/#example-github-pages-websites","text":"GitHub and Government , https://github.com/github/government.github.com ElectronJS , https://github.com/electron/electronjs.org Twitter GitHub , https://github.com/twitter/twitter.github.io React , https://github.com/facebook/react","title":"Example GitHub Pages Websites"},{"location":"clusters-at-yale/guides/gpus-cuda/","text":"GPUs and CUDA There are GPUs available for general use on the YCRC clusters. In order to use them, you must request them for your job . See the Grace , McCleary , and Milgram pages for hardware and partition specifics. Please do not use nodes with GPUs unless your application or job can make use of them. Any jobs submitted to a GPU partition without having requested a GPU may be terminated without warning. Monitor Activity and Drivers The CUDA libraries you load will allow you to compile code against them. To run CUDA-enabled code you must also be running on a node with a gpu allocated and a compatible driver installed. The minimum driver versions are listed on this nvidia developer site . You can check the available GPUs, their current usage, installed version of the nvidia drivers, and more with the command nvidia-smi . Either in an interactive job or after connecting to a node running your job with ssh , nvidia-smi output should look something like this: [ user@gpu01 ~ ] $ nvidia-smi +-----------------------------------------------------------------------------+ | NVIDIA-SMI 460 .32.03 Driver Version: 460 .32.03 CUDA Version: 11 .2 | | -------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | =============================== + ====================== + ====================== | | 0 GeForce GTX 108 ... On | 00000000 :02:00.0 Off | N/A | | 23 % 34C P8 9W / 250W | 1MiB / 11178MiB | 0 % Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | | ============================================================================= | | No running processes found | +-----------------------------------------------------------------------------+ Here we see that the node gpu01 is running driver version 460.32.03 and is compatible with CUDA version 11.2. There are no processes using the GPU allocated to this job. Software Cuda, cuDNN, tensorflow, and pytorch availability on cluster We have built certain versions of CUDA, cuDNN, tensorflow, and pytorch on all the clusters YCRC maintains. If one of the versions of these modules aligns with the version needed for your research, then there may be no need to install these programs yourself. To list all the modules available for these programs: module avail cuda/ module avail cudnn/ module avail tensorflow module avail pytorch Tensorflow You can find hints about the correct version of Tensorflow from their tested build configurations . You can also test your install with a simple script that imports Tensorflow (run on a GPU node). If you an ImportError that mentions missing libraries like libcublas.so.9.0 , for example, that means that Tensorflow is probably expecting CUDA v 9.0 but cannot find it. Tensorflow-gpu Tensorflow-gpu is now depreciated for newer versions of CUDA and cuDNN and has been combined with the original tensorflow. Any version of tensorflow 2.* contains gpu capabilities and should be installed instead of attempting to install tensorflow-gpu. Create an Example Tensorflow Environment To create a conda environment with Tensorflow and uses the module CUDA: # load modules, including the system CUDA and cuDNN module load miniconda CUDAcore/11.3.1 cuDNN/8.2.1.32-CUDA-11.3.1 # save module collection for future use module save cuda11 #create environment with required dependencies conda create --name tf-modulecuda python = 3 .11.* numpy pandas matplotlib jupyter -c conda-forge # activate environment conda activate tf-modulecuda # use pip to install tensorflow pip install tensorflow == 2 .12.* The most up to date instructions for creating your own cuda/tensorflow environment can be found here . To create a conda environment with your own versions of Cuda and tensorflow: For tensorflow 2.12+: module load miniconda conda create --name tf-condacuda python numpy pandas matplotlib jupyter cudatoolkit = 11 .8.0 conda activate tf-condacuda pip install nvidia-cudnn-cu11 == 8 .6.0.163 # Store system paths to cuda libraries for gpu communication mkdir -p $CONDA_PREFIX /etc/conda/activate.d echo 'CUDNN_PATH=$(dirname $(python -c \"import nvidia.cudnn;print(nvidia.cudnn.__file__)\"))' >> $CONDA_PREFIX /etc/conda/activate.d/env_vars.sh echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX /etc/conda/activate.d/env_vars.sh #install tensorflow pip install tensorflow == 2 .12.* For tensorflow 2.11.* module load miniconda conda create --name tf-condacuda python numpy pandas matplotlib jupyter cudatoolkit = 11 .3.1 cudnn = 8 .2.1 conda activate tf-condacuda # Store system paths to cuda libraries for gpu communication mkdir -p $CONDA_PREFIX /etc/conda/activate.d echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX /etc/conda/activate.d/env_vars.sh #install tensorflow pip install tensorflow == 2 .11.* Use Your Environment To re-enter your environment you only need the following: module load miniconda conda activate tf-condacuda Or if using the module-installed CUDA: module restore cuda11 conda activate tf-modulecuda PyTorch As with Tensorflow, sometimes the conda-supplied CUDA libraries are sufficient for the version of PyTorch you are installing. If not make sure you have the version of CUDA referenced on the PyTorch site in their install instructions . They also provide instructions on installing previous versions compatible with older versions of CUDA. Following the instructions on their site, create a PyTorch environment using conda : module load miniconda conda create --name pytorch_env pytorch torchvision torchaudio pytorch-cuda = 11 .7 -c pytorch -c nvidia Compile .c or .cpp Files with CUDA code By default, nvcc expects that host code is in files with a .c or .cpp extension, and device code is in files with a .cu extension. When you mix device code in a .c or .cpp file with host code, the device code will not be recoganized by nvcc unless you add this flag: -x cu . nvcc -x cu mycuda.cpp -o mycuda.exe","title":"GPUs and CUDA"},{"location":"clusters-at-yale/guides/gpus-cuda/#gpus-and-cuda","text":"There are GPUs available for general use on the YCRC clusters. In order to use them, you must request them for your job . See the Grace , McCleary , and Milgram pages for hardware and partition specifics. Please do not use nodes with GPUs unless your application or job can make use of them. Any jobs submitted to a GPU partition without having requested a GPU may be terminated without warning.","title":"GPUs and CUDA"},{"location":"clusters-at-yale/guides/gpus-cuda/#monitor-activity-and-drivers","text":"The CUDA libraries you load will allow you to compile code against them. To run CUDA-enabled code you must also be running on a node with a gpu allocated and a compatible driver installed. The minimum driver versions are listed on this nvidia developer site . You can check the available GPUs, their current usage, installed version of the nvidia drivers, and more with the command nvidia-smi . Either in an interactive job or after connecting to a node running your job with ssh , nvidia-smi output should look something like this: [ user@gpu01 ~ ] $ nvidia-smi +-----------------------------------------------------------------------------+ | NVIDIA-SMI 460 .32.03 Driver Version: 460 .32.03 CUDA Version: 11 .2 | | -------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | =============================== + ====================== + ====================== | | 0 GeForce GTX 108 ... On | 00000000 :02:00.0 Off | N/A | | 23 % 34C P8 9W / 250W | 1MiB / 11178MiB | 0 % Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | | ============================================================================= | | No running processes found | +-----------------------------------------------------------------------------+ Here we see that the node gpu01 is running driver version 460.32.03 and is compatible with CUDA version 11.2. There are no processes using the GPU allocated to this job.","title":"Monitor Activity and Drivers"},{"location":"clusters-at-yale/guides/gpus-cuda/#software","text":"","title":"Software"},{"location":"clusters-at-yale/guides/gpus-cuda/#cuda-cudnn-tensorflow-and-pytorch-availability-on-cluster","text":"We have built certain versions of CUDA, cuDNN, tensorflow, and pytorch on all the clusters YCRC maintains. If one of the versions of these modules aligns with the version needed for your research, then there may be no need to install these programs yourself. To list all the modules available for these programs: module avail cuda/ module avail cudnn/ module avail tensorflow module avail pytorch","title":"Cuda, cuDNN, tensorflow, and pytorch availability on cluster"},{"location":"clusters-at-yale/guides/gpus-cuda/#tensorflow","text":"You can find hints about the correct version of Tensorflow from their tested build configurations . You can also test your install with a simple script that imports Tensorflow (run on a GPU node). If you an ImportError that mentions missing libraries like libcublas.so.9.0 , for example, that means that Tensorflow is probably expecting CUDA v 9.0 but cannot find it.","title":"Tensorflow"},{"location":"clusters-at-yale/guides/gpus-cuda/#tensorflow-gpu","text":"Tensorflow-gpu is now depreciated for newer versions of CUDA and cuDNN and has been combined with the original tensorflow. Any version of tensorflow 2.* contains gpu capabilities and should be installed instead of attempting to install tensorflow-gpu.","title":"Tensorflow-gpu"},{"location":"clusters-at-yale/guides/gpus-cuda/#create-an-example-tensorflow-environment","text":"To create a conda environment with Tensorflow and uses the module CUDA: # load modules, including the system CUDA and cuDNN module load miniconda CUDAcore/11.3.1 cuDNN/8.2.1.32-CUDA-11.3.1 # save module collection for future use module save cuda11 #create environment with required dependencies conda create --name tf-modulecuda python = 3 .11.* numpy pandas matplotlib jupyter -c conda-forge # activate environment conda activate tf-modulecuda # use pip to install tensorflow pip install tensorflow == 2 .12.* The most up to date instructions for creating your own cuda/tensorflow environment can be found here . To create a conda environment with your own versions of Cuda and tensorflow: For tensorflow 2.12+: module load miniconda conda create --name tf-condacuda python numpy pandas matplotlib jupyter cudatoolkit = 11 .8.0 conda activate tf-condacuda pip install nvidia-cudnn-cu11 == 8 .6.0.163 # Store system paths to cuda libraries for gpu communication mkdir -p $CONDA_PREFIX /etc/conda/activate.d echo 'CUDNN_PATH=$(dirname $(python -c \"import nvidia.cudnn;print(nvidia.cudnn.__file__)\"))' >> $CONDA_PREFIX /etc/conda/activate.d/env_vars.sh echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX /etc/conda/activate.d/env_vars.sh #install tensorflow pip install tensorflow == 2 .12.* For tensorflow 2.11.* module load miniconda conda create --name tf-condacuda python numpy pandas matplotlib jupyter cudatoolkit = 11 .3.1 cudnn = 8 .2.1 conda activate tf-condacuda # Store system paths to cuda libraries for gpu communication mkdir -p $CONDA_PREFIX /etc/conda/activate.d echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX /etc/conda/activate.d/env_vars.sh #install tensorflow pip install tensorflow == 2 .11.*","title":"Create an Example Tensorflow Environment"},{"location":"clusters-at-yale/guides/gpus-cuda/#use-your-environment","text":"To re-enter your environment you only need the following: module load miniconda conda activate tf-condacuda Or if using the module-installed CUDA: module restore cuda11 conda activate tf-modulecuda","title":"Use Your Environment"},{"location":"clusters-at-yale/guides/gpus-cuda/#pytorch","text":"As with Tensorflow, sometimes the conda-supplied CUDA libraries are sufficient for the version of PyTorch you are installing. If not make sure you have the version of CUDA referenced on the PyTorch site in their install instructions . They also provide instructions on installing previous versions compatible with older versions of CUDA. Following the instructions on their site, create a PyTorch environment using conda : module load miniconda conda create --name pytorch_env pytorch torchvision torchaudio pytorch-cuda = 11 .7 -c pytorch -c nvidia","title":"PyTorch"},{"location":"clusters-at-yale/guides/gpus-cuda/#compile-c-or-cpp-files-with-cuda-code","text":"By default, nvcc expects that host code is in files with a .c or .cpp extension, and device code is in files with a .cu extension. When you mix device code in a .c or .cpp file with host code, the device code will not be recoganized by nvcc unless you add this flag: -x cu . nvcc -x cu mycuda.cpp -o mycuda.exe","title":"Compile .c or .cpp Files with CUDA code"},{"location":"clusters-at-yale/guides/isca/","text":"Isca Isca is a framework used for idealized global circulation modelling. We recommend that you install it for yourself individually as the code expects to be able to modify its source code files. It is relatively straighforward to install into a conda environment as described below. Install Isca Install it for just your user as a Python conda environment called \"isca\". module load netCDF-Fortran/4.5.3-gompi-2020b module load miniconda module save isca mkdir ~/programs cd ~/programs git clone https://www.github.com/execlim/isca.git conda create -n isca python=3.7 conda activate isca conda install tqdm cd isca/src/extra/python pip install -e . Then add the following to your .bashrc file # Isca # directory of the Isca source code export GFDL_BASE=$HOME/programs/isca # \"environment\" configuration for grace export GFDL_ENV=gfortran # temporary working directory used in running the model export GFDL_WORK=$PALMER_SCRATCH/gfdl_work # directory for storing model output export GFDL_DATA=$GIBBS_PROJECT/gfdl_data Select an Experiment and Update the Flags We are using GCC version 10.x for this build, so a slight modification needs to made to Isca for it to build . Add the following line to the experiment script (e.g. $GFDL_BASE/exp/test_cases/held_suarez/held_suarez_test_case.py ), after cb is defined (so about line 13 in that file). cb.compile_flags.extend(['-fallow-argument-mismatch', '-fallow-invalid-boz']) Run Isca The above commands only need to be run once to set everything up. To use it, you will first always need to run: module restore isca conda activate isca Then you should be able to compile and launch your ISCA models.","title":"Isca"},{"location":"clusters-at-yale/guides/isca/#isca","text":"Isca is a framework used for idealized global circulation modelling. We recommend that you install it for yourself individually as the code expects to be able to modify its source code files. It is relatively straighforward to install into a conda environment as described below.","title":"Isca"},{"location":"clusters-at-yale/guides/isca/#install-isca","text":"Install it for just your user as a Python conda environment called \"isca\". module load netCDF-Fortran/4.5.3-gompi-2020b module load miniconda module save isca mkdir ~/programs cd ~/programs git clone https://www.github.com/execlim/isca.git conda create -n isca python=3.7 conda activate isca conda install tqdm cd isca/src/extra/python pip install -e . Then add the following to your .bashrc file # Isca # directory of the Isca source code export GFDL_BASE=$HOME/programs/isca # \"environment\" configuration for grace export GFDL_ENV=gfortran # temporary working directory used in running the model export GFDL_WORK=$PALMER_SCRATCH/gfdl_work # directory for storing model output export GFDL_DATA=$GIBBS_PROJECT/gfdl_data","title":"Install Isca"},{"location":"clusters-at-yale/guides/isca/#select-an-experiment-and-update-the-flags","text":"We are using GCC version 10.x for this build, so a slight modification needs to made to Isca for it to build . Add the following line to the experiment script (e.g. $GFDL_BASE/exp/test_cases/held_suarez/held_suarez_test_case.py ), after cb is defined (so about line 13 in that file). cb.compile_flags.extend(['-fallow-argument-mismatch', '-fallow-invalid-boz'])","title":"Select an Experiment and Update the Flags"},{"location":"clusters-at-yale/guides/isca/#run-isca","text":"The above commands only need to be run once to set everything up. To use it, you will first always need to run: module restore isca conda activate isca Then you should be able to compile and launch your ISCA models.","title":"Run Isca"},{"location":"clusters-at-yale/guides/jupyter/","text":"Jupyter Notebooks We provide a simple way to start Jupyter Notebook interfaces for Python and R using Open OnDemand . Jupyter notebooks provide a flexible way to interactively work with code and plots presented in-line together. To get started choose Jupyter Notebook from the OOD Interactive Apps menu or click on the link on the dashboard. Before you get started, you will need to be on campus or logged in to the Yale VPN and you will need to set up a Jupyter environment. Set up an environment We recommend you use miniconda to manage your Jupyter environments. You can create Conda environments from the OOD shell interface or from a terminal-based login to the clusters. For example, if you want to create an environment with many commonly used scientific computing Python packages you would run: module load miniconda conda create -y -n notebook_env python jupyter numpy pandas matplotlib Specify your resource request You can use the ycrc_default environment or chose one of your own from the drop-down menu. After specifying the required resources (number of CPUs/GPUs, amount of RAM, etc.), you can submit the job. When it launches you can open the standard Jupyter interface where you can start working with notebooks. Tip If you have installed and want to use Jupyter Lab click the Start JupyterLab checkbox. If there is a specific workflow which OOD does not satisfy, let us know and we can help.","title":"Jupyter Notebooks"},{"location":"clusters-at-yale/guides/jupyter/#jupyter-notebooks","text":"We provide a simple way to start Jupyter Notebook interfaces for Python and R using Open OnDemand . Jupyter notebooks provide a flexible way to interactively work with code and plots presented in-line together. To get started choose Jupyter Notebook from the OOD Interactive Apps menu or click on the link on the dashboard. Before you get started, you will need to be on campus or logged in to the Yale VPN and you will need to set up a Jupyter environment.","title":"Jupyter Notebooks"},{"location":"clusters-at-yale/guides/jupyter/#set-up-an-environment","text":"We recommend you use miniconda to manage your Jupyter environments. You can create Conda environments from the OOD shell interface or from a terminal-based login to the clusters. For example, if you want to create an environment with many commonly used scientific computing Python packages you would run: module load miniconda conda create -y -n notebook_env python jupyter numpy pandas matplotlib","title":"Set up an environment"},{"location":"clusters-at-yale/guides/jupyter/#specify-your-resource-request","text":"You can use the ycrc_default environment or chose one of your own from the drop-down menu. After specifying the required resources (number of CPUs/GPUs, amount of RAM, etc.), you can submit the job. When it launches you can open the standard Jupyter interface where you can start working with notebooks. Tip If you have installed and want to use Jupyter Lab click the Start JupyterLab checkbox. If there is a specific workflow which OOD does not satisfy, let us know and we can help.","title":"Specify your resource request"},{"location":"clusters-at-yale/guides/jupyter_ssh/","text":"Jupyter Notebooks over SSH Port Forwarding If you want finer control over your notebook job, or wish to use something besides conda for your Python environment, you can manually configure a Jupyter notebook and connect manually. The main steps are: Start a Jupyter notebook job. Start an ssh tunnel. Use your local browser to connect. Start the Server Here is a template for submitting a jupyter-notebook server as a batch job. You may need to edit some of the slurm options, including the time limit or the partition. You will also need to either load a module that contains jupyter-notebook . Tip If you are using a Conda environment, please follow the instructions for launching a Jupyter session via Open OnDemand . Save your edited version of this script on the cluster, and submit it with sbatch . #!/bin/bash #SBATCH --partition devel #SBATCH --cpus-per-task 1 #SBATCH --mem-per-cpu 8G #SBATCH --time 6:00:00 #SBATCH --job-name jupyter-notebook #SBATCH --output jupyter-notebook-%J.log # get tunneling info XDG_RUNTIME_DIR = \"\" port = $( shuf -i8000-9999 -n1 ) node = $( hostname -s ) user = $( whoami ) cluster = $( hostname -f | awk -F \".\" '{print $2}' ) # print tunneling instructions jupyter-log echo -e \" For more info and how to connect from windows, see https://docs.ycrc.yale.edu/clusters-at-yale/guides/jupyter/ MacOS or linux terminal command to create your ssh tunnel ssh -N -L ${ port } : ${ node } : ${ port } ${ user } @ ${ cluster } .ycrc.yale.edu Windows MobaXterm info Forwarded port:same as remote port Remote server: ${ node } Remote port: ${ port } SSH server: ${ cluster } .ycrc.yale.edu SSH login: $user SSH port: 22 Use a Browser on your local machine to go to: localhost: ${ port } (prefix w/ https:// if using password) \" # load modules or conda environments here jupyter-notebook --no-browser --port = ${ port } --ip = ${ node } Start the Tunnel Once you have submitted your job and it starts, your notebook server will be ready for you to connect. You can run squeue -u${USER} to check. You will see an \"R\" in the ST or status column for your notebook job if it is running. If you see a \"PD\" in the status column, you will have to wait for your job to start running to connect. The log file with information about how to connect will be in the directory you submitted the script from, and be named jupyter-notebook-[jobid].log where jobid is the slurm id for your job. MacOS and Linux On a Mac or Linux machine, you can start the tunnel with an SSH command. You can check the output from the job you started to get the specifc info you need. Windows On a Windows machine, we recommend you use MobaXterm. See our guide on connecting with MobaXterm for instructions on how to get set up. You will need to take a look at your job's log file to get the details you need. Then start MobaXterm: Under Tools choose \"MobaSSHTunnel (port forwarding)\". Click the \"New SSH Tunnel\" button. Click the radio button for \"Local port forwarding\". Use the information in your jupyter notebook log file to fill out the boxes. Click Save. On your new tunnel, click the key symbol under the settings column and choose your ssh private key. Click the play button under the Start/Stop column. Browse the Notebook Finally, open a web browser on your local machine and enter the address http://localhost:port where port is the one specified in your log file. The address Jupyter creates by default (the one with the name of a compute node) will not work outside the cluster's network. Since version 5 of jupyter, the notebook will automatically generate a token that allows you to authenticate when you connect. It is long, and will be at the end of the url jupyter generates. It will look something like http://c14n06:9230/?token=**ad0775eaff315e6f1d98b13ef10b919bc6b9ef7d0605cc20** If you run into trouble or need help, contact us .","title":"Jupyter Notebooks over SSH Port Forwarding"},{"location":"clusters-at-yale/guides/jupyter_ssh/#jupyter-notebooks-over-ssh-port-forwarding","text":"If you want finer control over your notebook job, or wish to use something besides conda for your Python environment, you can manually configure a Jupyter notebook and connect manually. The main steps are: Start a Jupyter notebook job. Start an ssh tunnel. Use your local browser to connect.","title":"Jupyter Notebooks over SSH Port Forwarding"},{"location":"clusters-at-yale/guides/jupyter_ssh/#start-the-server","text":"Here is a template for submitting a jupyter-notebook server as a batch job. You may need to edit some of the slurm options, including the time limit or the partition. You will also need to either load a module that contains jupyter-notebook . Tip If you are using a Conda environment, please follow the instructions for launching a Jupyter session via Open OnDemand . Save your edited version of this script on the cluster, and submit it with sbatch . #!/bin/bash #SBATCH --partition devel #SBATCH --cpus-per-task 1 #SBATCH --mem-per-cpu 8G #SBATCH --time 6:00:00 #SBATCH --job-name jupyter-notebook #SBATCH --output jupyter-notebook-%J.log # get tunneling info XDG_RUNTIME_DIR = \"\" port = $( shuf -i8000-9999 -n1 ) node = $( hostname -s ) user = $( whoami ) cluster = $( hostname -f | awk -F \".\" '{print $2}' ) # print tunneling instructions jupyter-log echo -e \" For more info and how to connect from windows, see https://docs.ycrc.yale.edu/clusters-at-yale/guides/jupyter/ MacOS or linux terminal command to create your ssh tunnel ssh -N -L ${ port } : ${ node } : ${ port } ${ user } @ ${ cluster } .ycrc.yale.edu Windows MobaXterm info Forwarded port:same as remote port Remote server: ${ node } Remote port: ${ port } SSH server: ${ cluster } .ycrc.yale.edu SSH login: $user SSH port: 22 Use a Browser on your local machine to go to: localhost: ${ port } (prefix w/ https:// if using password) \" # load modules or conda environments here jupyter-notebook --no-browser --port = ${ port } --ip = ${ node }","title":"Start the Server"},{"location":"clusters-at-yale/guides/jupyter_ssh/#start-the-tunnel","text":"Once you have submitted your job and it starts, your notebook server will be ready for you to connect. You can run squeue -u${USER} to check. You will see an \"R\" in the ST or status column for your notebook job if it is running. If you see a \"PD\" in the status column, you will have to wait for your job to start running to connect. The log file with information about how to connect will be in the directory you submitted the script from, and be named jupyter-notebook-[jobid].log where jobid is the slurm id for your job.","title":"Start the Tunnel"},{"location":"clusters-at-yale/guides/jupyter_ssh/#macos-and-linux","text":"On a Mac or Linux machine, you can start the tunnel with an SSH command. You can check the output from the job you started to get the specifc info you need.","title":"MacOS and Linux"},{"location":"clusters-at-yale/guides/jupyter_ssh/#windows","text":"On a Windows machine, we recommend you use MobaXterm. See our guide on connecting with MobaXterm for instructions on how to get set up. You will need to take a look at your job's log file to get the details you need. Then start MobaXterm: Under Tools choose \"MobaSSHTunnel (port forwarding)\". Click the \"New SSH Tunnel\" button. Click the radio button for \"Local port forwarding\". Use the information in your jupyter notebook log file to fill out the boxes. Click Save. On your new tunnel, click the key symbol under the settings column and choose your ssh private key. Click the play button under the Start/Stop column.","title":"Windows"},{"location":"clusters-at-yale/guides/jupyter_ssh/#browse-the-notebook","text":"Finally, open a web browser on your local machine and enter the address http://localhost:port where port is the one specified in your log file. The address Jupyter creates by default (the one with the name of a compute node) will not work outside the cluster's network. Since version 5 of jupyter, the notebook will automatically generate a token that allows you to authenticate when you connect. It is long, and will be at the end of the url jupyter generates. It will look something like http://c14n06:9230/?token=**ad0775eaff315e6f1d98b13ef10b919bc6b9ef7d0605cc20** If you run into trouble or need help, contact us .","title":"Browse the Notebook"},{"location":"clusters-at-yale/guides/mathematica/","text":"Mathematica Open OnDemand We strongly recommend using Open OnDemand to launch Mathematica. First, open OOD in a browser and navigate to the Apps button. Select All Apps from the drop-down menu and then select Mathematica from the list. Fill in your resource requests and launch your job. Once started, click Launch Mathematica and Mathematica will be opened in a new tab in the browser. Interactive Job Alternatively, you could start an interacgive session with X11 forwarding. Warning The Mathematica program is too large to fit on a login node. If you try to run it there, it will crash. Instead, launch it in an interactive job (see below). To run Mathematica interactively, you need to request an interactive session on a compute node. You could start an interactive session using Slurm. For example, to use 4 cores on 1 node: salloc --x11 -c 4 -t 4:00:00 Note that if you are on macOS, you will need to install an additional program to use the GUI. See our X11 Forwarding documentation for instructions. See our Slurm documentation for more detailed information on requesting resources for interactive jobs. To launch Mathematica, you will first need to make sure you have the correct module loaded. You can search for all available Mathematica versions: module avail mathematica Load the appropriate module file. For example, to run version 12.0.0: module load Mathematica/12.0.0 The module load command sets up your environment, including the PATH to find the proper version of the Mathematica program. If you would like to avoid running the load command every session, you can run module save and then the Mathematica module will be loaded every time you login. Once you have the appropriate module loaded in an interactive job, start Mathematica. The & will put the program in the background so you can continue to use your terminal session. Mathematica & Configure Environment for Parallel Jobs Mathematica installed on Yale HPC clusters includes our proprietary scripts to run parallel jobs in SLURM environments. These scripts are designed in a way to allow users to access up to 450 parallel kernels. When a user asks for a specific number of kernels, the wait time to get them might differ dramatically depending on requested computing resources as well as on how busy the HPC cluster is at that moment. To reduce waiting time, our scripts try to launch as many kernels as possible at the moment the user asks for them. Most of the time you will not get launched with the same number of kernels as you requested. We recommend checking the final number of parallel kernels you\u2019ve gotten after the launching command has completed no matter if you run a Front End Mathematica session or execute Wolfram script. One of the ways to check this would be the Mathematica command Length[Kernels[]] . In order to run parallel Mathematica jobs on our cluster, you will need to configure your Mathematica environment. You have to do this within a Front End session. If you run Wolfram script you need to run a Front End session to set your parallel environment before executing your script. Once Mathematica is started, open a new document in the Mathematica window and go to Edit > Preferences . From there, go to Evaluate/Parallel Kernel Configuration and change the following settings: Under Local Kernels , disable Local Kernels if it is enabled Go in Cluster Integration and first enable cluster integration it if it is not enabled Under the Cluster Integration tab, expand the Advanced Settings arrow. When you configure parallel kernels for the first time, please select SLURM from the Cluster Engine pull-down menu Matching parallel kernel versions with your main Mathematica version is important, especially if you\u2019ve already had SLURM selected by running different Mathematica versions previously (you might see different versions in Kernel program) In this case, select Windows CCS from Cluster Engine and a red error will appear in Advanced Settings. After this select SLURM again as this should set the correct engine for you. Under Kernels , set your desired number (we recommend to set it lower first to test) In Advanced Settings under Native specification , specify time and RAM per kernel, such as \u2014time=02:00:00 \u2014mem=20G (please note that this is RAM per one kernel) If you are using Mathematica 12.3 and above, and if RemoteKernel Objects is enabled, disable it and restart your Mathematica session We recommend to use these commands to start kernels and to check how many kernels have actually been launched (please keep them in the same Mathematica cell and separate by semicolons; Do not use semicolon at the end) $DefaultKernels=$ConfiguredKernels; LaunchKernels[]; Length[Kernels[]] Request Help or Access to Wolfram Alpha Pro If you need any assistance with your Mathematica program, contact us .","title":"Mathematica"},{"location":"clusters-at-yale/guides/mathematica/#mathematica","text":"","title":"Mathematica"},{"location":"clusters-at-yale/guides/mathematica/#open-ondemand","text":"We strongly recommend using Open OnDemand to launch Mathematica. First, open OOD in a browser and navigate to the Apps button. Select All Apps from the drop-down menu and then select Mathematica from the list. Fill in your resource requests and launch your job. Once started, click Launch Mathematica and Mathematica will be opened in a new tab in the browser.","title":"Open OnDemand"},{"location":"clusters-at-yale/guides/mathematica/#interactive-job","text":"Alternatively, you could start an interacgive session with X11 forwarding. Warning The Mathematica program is too large to fit on a login node. If you try to run it there, it will crash. Instead, launch it in an interactive job (see below). To run Mathematica interactively, you need to request an interactive session on a compute node. You could start an interactive session using Slurm. For example, to use 4 cores on 1 node: salloc --x11 -c 4 -t 4:00:00 Note that if you are on macOS, you will need to install an additional program to use the GUI. See our X11 Forwarding documentation for instructions. See our Slurm documentation for more detailed information on requesting resources for interactive jobs. To launch Mathematica, you will first need to make sure you have the correct module loaded. You can search for all available Mathematica versions: module avail mathematica Load the appropriate module file. For example, to run version 12.0.0: module load Mathematica/12.0.0 The module load command sets up your environment, including the PATH to find the proper version of the Mathematica program. If you would like to avoid running the load command every session, you can run module save and then the Mathematica module will be loaded every time you login. Once you have the appropriate module loaded in an interactive job, start Mathematica. The & will put the program in the background so you can continue to use your terminal session. Mathematica &","title":"Interactive Job"},{"location":"clusters-at-yale/guides/mathematica/#configure-environment-for-parallel-jobs","text":"Mathematica installed on Yale HPC clusters includes our proprietary scripts to run parallel jobs in SLURM environments. These scripts are designed in a way to allow users to access up to 450 parallel kernels. When a user asks for a specific number of kernels, the wait time to get them might differ dramatically depending on requested computing resources as well as on how busy the HPC cluster is at that moment. To reduce waiting time, our scripts try to launch as many kernels as possible at the moment the user asks for them. Most of the time you will not get launched with the same number of kernels as you requested. We recommend checking the final number of parallel kernels you\u2019ve gotten after the launching command has completed no matter if you run a Front End Mathematica session or execute Wolfram script. One of the ways to check this would be the Mathematica command Length[Kernels[]] . In order to run parallel Mathematica jobs on our cluster, you will need to configure your Mathematica environment. You have to do this within a Front End session. If you run Wolfram script you need to run a Front End session to set your parallel environment before executing your script. Once Mathematica is started, open a new document in the Mathematica window and go to Edit > Preferences . From there, go to Evaluate/Parallel Kernel Configuration and change the following settings: Under Local Kernels , disable Local Kernels if it is enabled Go in Cluster Integration and first enable cluster integration it if it is not enabled Under the Cluster Integration tab, expand the Advanced Settings arrow. When you configure parallel kernels for the first time, please select SLURM from the Cluster Engine pull-down menu Matching parallel kernel versions with your main Mathematica version is important, especially if you\u2019ve already had SLURM selected by running different Mathematica versions previously (you might see different versions in Kernel program) In this case, select Windows CCS from Cluster Engine and a red error will appear in Advanced Settings. After this select SLURM again as this should set the correct engine for you. Under Kernels , set your desired number (we recommend to set it lower first to test) In Advanced Settings under Native specification , specify time and RAM per kernel, such as \u2014time=02:00:00 \u2014mem=20G (please note that this is RAM per one kernel) If you are using Mathematica 12.3 and above, and if RemoteKernel Objects is enabled, disable it and restart your Mathematica session We recommend to use these commands to start kernels and to check how many kernels have actually been launched (please keep them in the same Mathematica cell and separate by semicolons; Do not use semicolon at the end) $DefaultKernels=$ConfiguredKernels; LaunchKernels[]; Length[Kernels[]]","title":"Configure Environment for Parallel Jobs"},{"location":"clusters-at-yale/guides/mathematica/#request-help-or-access-to-wolfram-alpha-pro","text":"If you need any assistance with your Mathematica program, contact us .","title":"Request Help or Access to Wolfram Alpha Pro"},{"location":"clusters-at-yale/guides/matlab/","text":"MATLAB MATLAB GUI To use the MATLAB GUI, we recommend our web portal, Open OnDemand . Once logged in, click MATLAB pinned on the dashboard, or select \"MATLAB\" from the \"Interactive Apps\" list. Command Line MATLAB Find MATLAB Run one of the commands below, which will list available versions and the corresponding module files: module avail matlab Load the appropriate module file. For example, to run version R2021a: module load MATLAB/2021a The module load command sets up your environment, including the PATH to find the proper version of the MATLAB program. Run MATLAB Warning If you try to run MATLAB on a login node, it will likely crash. Instead, launch it in an interactive or batch job (see below). Interactive Job (without a GUI) To run MATLAB interactively, you need to create an interactive session on a compute node. You could start an interactive session using 4 cores, 16GiB of RAM for 4 hours with: salloc -c 4 --mem 16G -t 4:00:00 Once your interactive session starts, you can load the appropriate module file and start MATLAB module load MATLAB/2021a # launch the MATLAB command line prompt maltab -nodisplay # launch a script on the command line matlab -nodisplay < runscript.m See our Slurm documentation for more detailed information on requesting resources for interactive jobs. Batch Mode (without a GUI) Create a batch script with the resource requests appropriate to your MATLAB function(s) and script(s). In it load the MATLAB module version you want, then run matlab with the -b option and your function/script name. Here is an example that requests 4 CPUs and 18GiB of memory for 8 hours: #!/bin/bash #SBATCH --job-name myjob #SBATCH --cpus-per-task 4 #SBATCH --mem 18G #SBATCH -t 8:00:00 module load MATLAB/2021a # assuming you have your_script.m in the current directory matlab -batch \"your_script\" # if using MATLAB older than R2019a # matlab -nojvm -nodisplay -nosplash < your_script.m Using More than 12 Cores with MATLAB In MATLAB, 12 workers is a poorly documented default limit (seemingly for historical reasons) when setting up the parallel environment. You can override it by explicitly setting up your parpool before calling parfor or other parallel functions. parpool(feature('NumCores'));","title":"MATLAB"},{"location":"clusters-at-yale/guides/matlab/#matlab","text":"","title":"MATLAB"},{"location":"clusters-at-yale/guides/matlab/#matlab-gui","text":"To use the MATLAB GUI, we recommend our web portal, Open OnDemand . Once logged in, click MATLAB pinned on the dashboard, or select \"MATLAB\" from the \"Interactive Apps\" list.","title":"MATLAB GUI"},{"location":"clusters-at-yale/guides/matlab/#command-line-matlab","text":"","title":"Command Line MATLAB"},{"location":"clusters-at-yale/guides/matlab/#find-matlab","text":"Run one of the commands below, which will list available versions and the corresponding module files: module avail matlab Load the appropriate module file. For example, to run version R2021a: module load MATLAB/2021a The module load command sets up your environment, including the PATH to find the proper version of the MATLAB program.","title":"Find MATLAB"},{"location":"clusters-at-yale/guides/matlab/#run-matlab","text":"Warning If you try to run MATLAB on a login node, it will likely crash. Instead, launch it in an interactive or batch job (see below).","title":"Run MATLAB"},{"location":"clusters-at-yale/guides/matlab/#interactive-job-without-a-gui","text":"To run MATLAB interactively, you need to create an interactive session on a compute node. You could start an interactive session using 4 cores, 16GiB of RAM for 4 hours with: salloc -c 4 --mem 16G -t 4:00:00 Once your interactive session starts, you can load the appropriate module file and start MATLAB module load MATLAB/2021a # launch the MATLAB command line prompt maltab -nodisplay # launch a script on the command line matlab -nodisplay < runscript.m See our Slurm documentation for more detailed information on requesting resources for interactive jobs.","title":"Interactive Job (without a GUI)"},{"location":"clusters-at-yale/guides/matlab/#batch-mode-without-a-gui","text":"Create a batch script with the resource requests appropriate to your MATLAB function(s) and script(s). In it load the MATLAB module version you want, then run matlab with the -b option and your function/script name. Here is an example that requests 4 CPUs and 18GiB of memory for 8 hours: #!/bin/bash #SBATCH --job-name myjob #SBATCH --cpus-per-task 4 #SBATCH --mem 18G #SBATCH -t 8:00:00 module load MATLAB/2021a # assuming you have your_script.m in the current directory matlab -batch \"your_script\" # if using MATLAB older than R2019a # matlab -nojvm -nodisplay -nosplash < your_script.m","title":"Batch Mode (without a GUI)"},{"location":"clusters-at-yale/guides/matlab/#using-more-than-12-cores-with-matlab","text":"In MATLAB, 12 workers is a poorly documented default limit (seemingly for historical reasons) when setting up the parallel environment. You can override it by explicitly setting up your parpool before calling parfor or other parallel functions. parpool(feature('NumCores'));","title":"Using More than 12 Cores with MATLAB"},{"location":"clusters-at-yale/guides/mpi4py/","text":"MPI Parallelism with Python Note Before venturing into MPI-based parallelism, consider whether your work can be resturctured to make use of dSQ or more \"embarrassingly parallel\" workflows. MPI can be thought of as a \"last resort\" for parallel programming. There are many computational problems that can be have increased performance by running pieces in parallel. These often require communication between the different steps and need a way to send messages between processes. Examples of this include simulations of galaxy formation and electric field simulations, analysis of a single large dataset, or complex search or sort algorithms. MPI and mpi4py There is a standard protocol, called MPI , that defines how messages are passed between processes, including one-to-one and broadcast communications. The Python module for this is called mpi4py : mpi4py Read The Docs Message Passing Interface implemented for Python. Supports point-to-point (sends, receives) and collective (broadcasts, scatters, gathers) communications of any picklable Python object, as well as optimized communications of Python object exposing the single-segment buffer interface (NumPy arrays, builtin bytes/string/array objects) We will go over a few simple examples here. Definitions COMM : The communication \"world\" defined by MPI RANK : an ID number given to each internal process to define communication SIZE : total number of processes allocated BROADCAST : One-to-many communication SCATTER : One-to-many data distribution GATHER : Many-to-one data distribution mpi4py on the clusters On the clusters, the easiest way to start using mpi4py is to use the module-based software for OpenMPI and Python: # toolchains 2020b and before module load SciPy-bundle/2020.11-foss-2020b # toolchains starting with 2022b module load mpi4py/3.1.4-gompi-2022b Warning mpi4py installed via Conda is unaware of the cluster infrastructure and therefore will likely only work on a single compute node. If you wish to get a conda environment working across multiple nodes, please reach out to hpc@yale.edu for assistance. Cluster Resource Requests MPI utilizes Slurm tasks as the individual parallel workers. Therefore, when requesting resources (either interactively or in batch-mode) the number of tasks will determine the number of parallel workers (or to use MPI's language, the SIZE of the COMM World ). To request four tasks (each with a single CPU) interactively run the following: salloc --cpus-per-task = 1 --ntasks = 4 This can also be achieved in batch-mode by including the following directives in your submission script: #SBATCH --cpus-per-task=1 #SBATCH --ntasks=4 A more detailed discussion of resource requests can be found here and further examples are available here . Examples Ex 1: Rank This is a simple example where each worker reports their RANK and the process ID running that particular task. from mpi4py import MPI # instantize the communication world comm = MPI . COMM_WORLD # get the size of the communication world size = comm . Get_size () # get this particular processes' `rank` ID rank = comm . Get_rank () PID = os . getpid () print ( f 'rank: { rank } has PID: { PID } ' ) We then execute this code (named mpi_simple.py ) by running the following on the command line: mpirun -n 4 python mpi_simple.py The mpirun command is a wrapper for the MPI interface. Then we tell that to set up a COMM_WORLD with 4 workers. Finally we tell mpirun to run python mpi_simple.py on each of the four workers. Which outputs the following: rank : 0 has PID : 89134 rank : 1 has PID : 89135 rank : 2 has PID : 89136 rank : 3 has PID : 89137 Ex 2: Point to Point Communicators The most basic communication operators are \" send \" and \" recv \". These can be a bit tricky since they are \"blocking\" commands and can cause the program to hang. comm . send ( obj , dest , tag = 0 ) comm . recv ( source = MPI . ANY_SOURCE , tag = MPI . ANY_TAG , status = None ) tag can be used as a filter dest must be a rank in the current communicator source can be a rank or a wild-card ( MPI.ANY_SOURCE ) status used to retrieve information about recv'd message We now we create a file ( mpi_comm.py ) that contains the following: from mpi4py import MPI comm = MPI . COMM_WORLD size = comm . Get_size () rank = comm . Get_rank () if rank == 0 : msg = 'Hello, world' comm . send ( msg , dest = 1 ) elif rank == 1 : s = comm . recv () print ( f \"rank { rank } : { s } \" ) When we run this on the command line ( mpirun -n 4 python mpi_comm.py ) we get the following: rank 1: Hello, world The RANK=0 process sends the message, and the RANK=1 process receives it. The other two processes are effectively bystanders in this example. Ex 3: Broadcast Now we will try a slightly more complicated example that involves sending messages and data between processes. # Import MPI from mpi4py import MPI # Define world comm = MPI . COMM_WORLD size = comm . Get_size () rank = comm . Get_rank () # Create some data in the RANK_0 worker if rank == 0 : data = { 'key1' : [ 7 , 2.72 , 2 + 3 j ], 'key2' : ( 'abc' , 'xyz' )} else : data = None # Broadcast the data from RANK_0 to all workers data = comm . bcast ( data , root = 0 ) # Append the RANK ID to the data data [ 'key1' ] . append ( rank ) # Print the resulting data print ( f \"Rank: { rank } , data: { data } \" ) We then execute this code (named mpi_message.py ) by running the following on the command line: mpirun -n 4 python mpi_message.py Which outputs the following: Rank : 0 , data : { 'key1' : [ 7 , 2.72 , ( 2 + 3 j ), 0 ], 'key2' : ( 'abc' , 'xyz' )} Rank : 2 , data : { 'key1' : [ 7 , 2.72 , ( 2 + 3 j ), 2 ], 'key2' : ( 'abc' , 'xyz' )} Rank : 3 , data : { 'key1' : [ 7 , 2.72 , ( 2 + 3 j ), 3 ], 'key2' : ( 'abc' , 'xyz' )} Rank : 1 , data : { 'key1' : [ 7 , 2.72 , ( 2 + 3 j ), 1 ], 'key2' : ( 'abc' , 'xyz' )} Ex 4: Scatter and Gather An effective way of distributing computationally intensive tasks is to scatter pieces of a large dataset to each task. The separate tasks perform some analysis on their chunk of data and then the results are gathered by RANK_0 . This example takes a large array of random numbers and splits it into pieces for each task. These smaller datasets are analyzed (taking an average in this example) and the results are returns to the main task with a Gather call. # import libraries from mpi4py import MPI import numpy as np # set up MPI world comm = MPI . COMM_WORLD size = comm . Get_size () # new: gives number of ranks in comm rank = comm . Get_rank () # generate a large array of data on RANK_0 numData = 100000000 # 100milion values each data = None if rank == 0 : data = np . random . normal ( loc = 10 , scale = 5 , size = numData ) # initialize empty arrays to receive the partial data partial = np . empty ( int ( numData / size ), dtype = 'd' ) # send data to the other workers comm . Scatter ( data , partial , root = 0 ) # prepare the reduced array to receive the processed data reduced = None if rank == 0 : reduced = np . empty ( size , dtype = 'd' ) # Average the partial arrays, and then gather them to RANK_0 comm . Gather ( np . average ( partial ), reduced , root = 0 ) if rank == 0 : print ( 'Full Average:' , np . average ( reduced )) This is executed on the command line: mpirun -n 4 python mpi/mpi_scatter.py Which prints: Full Average: 10.00002060397186 Key Take-aways and Further Reading MPI is a powerful tool to set up communication worlds and send data and messages between workers The mpi4py module provides tools for using MPI within Python. This is just the beginning, mpi4py can be used for so much more... To learn more, take a look at the mpi4py tutorial here .","title":"MPI with Python"},{"location":"clusters-at-yale/guides/mpi4py/#mpi-parallelism-with-python","text":"Note Before venturing into MPI-based parallelism, consider whether your work can be resturctured to make use of dSQ or more \"embarrassingly parallel\" workflows. MPI can be thought of as a \"last resort\" for parallel programming. There are many computational problems that can be have increased performance by running pieces in parallel. These often require communication between the different steps and need a way to send messages between processes. Examples of this include simulations of galaxy formation and electric field simulations, analysis of a single large dataset, or complex search or sort algorithms.","title":"MPI Parallelism with Python"},{"location":"clusters-at-yale/guides/mpi4py/#mpi-and-mpi4py","text":"There is a standard protocol, called MPI , that defines how messages are passed between processes, including one-to-one and broadcast communications. The Python module for this is called mpi4py : mpi4py Read The Docs Message Passing Interface implemented for Python. Supports point-to-point (sends, receives) and collective (broadcasts, scatters, gathers) communications of any picklable Python object, as well as optimized communications of Python object exposing the single-segment buffer interface (NumPy arrays, builtin bytes/string/array objects) We will go over a few simple examples here.","title":"MPI and mpi4py"},{"location":"clusters-at-yale/guides/mpi4py/#definitions","text":"COMM : The communication \"world\" defined by MPI RANK : an ID number given to each internal process to define communication SIZE : total number of processes allocated BROADCAST : One-to-many communication SCATTER : One-to-many data distribution GATHER : Many-to-one data distribution","title":"Definitions"},{"location":"clusters-at-yale/guides/mpi4py/#mpi4py-on-the-clusters","text":"On the clusters, the easiest way to start using mpi4py is to use the module-based software for OpenMPI and Python: # toolchains 2020b and before module load SciPy-bundle/2020.11-foss-2020b # toolchains starting with 2022b module load mpi4py/3.1.4-gompi-2022b Warning mpi4py installed via Conda is unaware of the cluster infrastructure and therefore will likely only work on a single compute node. If you wish to get a conda environment working across multiple nodes, please reach out to hpc@yale.edu for assistance.","title":"mpi4py on the clusters"},{"location":"clusters-at-yale/guides/mpi4py/#cluster-resource-requests","text":"MPI utilizes Slurm tasks as the individual parallel workers. Therefore, when requesting resources (either interactively or in batch-mode) the number of tasks will determine the number of parallel workers (or to use MPI's language, the SIZE of the COMM World ). To request four tasks (each with a single CPU) interactively run the following: salloc --cpus-per-task = 1 --ntasks = 4 This can also be achieved in batch-mode by including the following directives in your submission script: #SBATCH --cpus-per-task=1 #SBATCH --ntasks=4 A more detailed discussion of resource requests can be found here and further examples are available here .","title":"Cluster Resource Requests"},{"location":"clusters-at-yale/guides/mpi4py/#examples","text":"","title":"Examples"},{"location":"clusters-at-yale/guides/mpi4py/#ex-1-rank","text":"This is a simple example where each worker reports their RANK and the process ID running that particular task. from mpi4py import MPI # instantize the communication world comm = MPI . COMM_WORLD # get the size of the communication world size = comm . Get_size () # get this particular processes' `rank` ID rank = comm . Get_rank () PID = os . getpid () print ( f 'rank: { rank } has PID: { PID } ' ) We then execute this code (named mpi_simple.py ) by running the following on the command line: mpirun -n 4 python mpi_simple.py The mpirun command is a wrapper for the MPI interface. Then we tell that to set up a COMM_WORLD with 4 workers. Finally we tell mpirun to run python mpi_simple.py on each of the four workers. Which outputs the following: rank : 0 has PID : 89134 rank : 1 has PID : 89135 rank : 2 has PID : 89136 rank : 3 has PID : 89137","title":"Ex 1: Rank"},{"location":"clusters-at-yale/guides/mpi4py/#ex-2-point-to-point-communicators","text":"The most basic communication operators are \" send \" and \" recv \". These can be a bit tricky since they are \"blocking\" commands and can cause the program to hang. comm . send ( obj , dest , tag = 0 ) comm . recv ( source = MPI . ANY_SOURCE , tag = MPI . ANY_TAG , status = None ) tag can be used as a filter dest must be a rank in the current communicator source can be a rank or a wild-card ( MPI.ANY_SOURCE ) status used to retrieve information about recv'd message We now we create a file ( mpi_comm.py ) that contains the following: from mpi4py import MPI comm = MPI . COMM_WORLD size = comm . Get_size () rank = comm . Get_rank () if rank == 0 : msg = 'Hello, world' comm . send ( msg , dest = 1 ) elif rank == 1 : s = comm . recv () print ( f \"rank { rank } : { s } \" ) When we run this on the command line ( mpirun -n 4 python mpi_comm.py ) we get the following: rank 1: Hello, world The RANK=0 process sends the message, and the RANK=1 process receives it. The other two processes are effectively bystanders in this example.","title":"Ex 2: Point to Point Communicators"},{"location":"clusters-at-yale/guides/mpi4py/#ex-3-broadcast","text":"Now we will try a slightly more complicated example that involves sending messages and data between processes. # Import MPI from mpi4py import MPI # Define world comm = MPI . COMM_WORLD size = comm . Get_size () rank = comm . Get_rank () # Create some data in the RANK_0 worker if rank == 0 : data = { 'key1' : [ 7 , 2.72 , 2 + 3 j ], 'key2' : ( 'abc' , 'xyz' )} else : data = None # Broadcast the data from RANK_0 to all workers data = comm . bcast ( data , root = 0 ) # Append the RANK ID to the data data [ 'key1' ] . append ( rank ) # Print the resulting data print ( f \"Rank: { rank } , data: { data } \" ) We then execute this code (named mpi_message.py ) by running the following on the command line: mpirun -n 4 python mpi_message.py Which outputs the following: Rank : 0 , data : { 'key1' : [ 7 , 2.72 , ( 2 + 3 j ), 0 ], 'key2' : ( 'abc' , 'xyz' )} Rank : 2 , data : { 'key1' : [ 7 , 2.72 , ( 2 + 3 j ), 2 ], 'key2' : ( 'abc' , 'xyz' )} Rank : 3 , data : { 'key1' : [ 7 , 2.72 , ( 2 + 3 j ), 3 ], 'key2' : ( 'abc' , 'xyz' )} Rank : 1 , data : { 'key1' : [ 7 , 2.72 , ( 2 + 3 j ), 1 ], 'key2' : ( 'abc' , 'xyz' )}","title":"Ex 3: Broadcast"},{"location":"clusters-at-yale/guides/mpi4py/#ex-4-scatter-and-gather","text":"An effective way of distributing computationally intensive tasks is to scatter pieces of a large dataset to each task. The separate tasks perform some analysis on their chunk of data and then the results are gathered by RANK_0 . This example takes a large array of random numbers and splits it into pieces for each task. These smaller datasets are analyzed (taking an average in this example) and the results are returns to the main task with a Gather call. # import libraries from mpi4py import MPI import numpy as np # set up MPI world comm = MPI . COMM_WORLD size = comm . Get_size () # new: gives number of ranks in comm rank = comm . Get_rank () # generate a large array of data on RANK_0 numData = 100000000 # 100milion values each data = None if rank == 0 : data = np . random . normal ( loc = 10 , scale = 5 , size = numData ) # initialize empty arrays to receive the partial data partial = np . empty ( int ( numData / size ), dtype = 'd' ) # send data to the other workers comm . Scatter ( data , partial , root = 0 ) # prepare the reduced array to receive the processed data reduced = None if rank == 0 : reduced = np . empty ( size , dtype = 'd' ) # Average the partial arrays, and then gather them to RANK_0 comm . Gather ( np . average ( partial ), reduced , root = 0 ) if rank == 0 : print ( 'Full Average:' , np . average ( reduced )) This is executed on the command line: mpirun -n 4 python mpi/mpi_scatter.py Which prints: Full Average: 10.00002060397186","title":"Ex 4: Scatter and Gather"},{"location":"clusters-at-yale/guides/mpi4py/#key-take-aways-and-further-reading","text":"MPI is a powerful tool to set up communication worlds and send data and messages between workers The mpi4py module provides tools for using MPI within Python. This is just the beginning, mpi4py can be used for so much more... To learn more, take a look at the mpi4py tutorial here .","title":"Key Take-aways and Further Reading"},{"location":"clusters-at-yale/guides/mysql/","text":"Mysql Mysql is a popular relational database. Because a database is usually thought of as a persistent service, it is not ordinarily run on HPC clusters, since allocations on an HPC cluster are temporary. If you need a persistent mysql database server, we recommend either installing mysql on a server in your lab, or using ITS's Spinup service. In either case, the mysql server can be accessed remotely from the HPC clusters. However, there are some use cases for running a mysql server on the cluster that do make sense. For example, some applications store their data in a mysql database that only needs to run when the application runs. Most instructions for installing mysql involve creating a persistent server and require admin privileges. The instructions that follow walk you through the process of running a mysql server using Apptainer on a cluster compute node without any special privileges. It uses an Apptainer container developed by Robert Grandin at Iowa State (Thanks!) All of the following must be done on an allocated compute node. Do not do this on the login node! Step 1: Create an installation directory somewhere, and cd to it mkdir ~/project/mysql cd ~/project/mysql Step 2: Create two config files Put the following in ~/.my.cnf. Note that you should change the password in both files to something else. [mysqld] innodb_use_native_aio=0 init-file=${HOME}/.mysqlrootpw [client] user=root password='my-secret-pw' Put the following in ~/.mysqlrootpw SET PASSWORD FOR 'root'@'localhost' = PASSWORD('my-secret-pw'); Step 3: Create data directories for mysql mkdir -p ${PWD}/mysql/var/lib/mysql ${PWD}/mysql/run/mysqld Step 4: Make a link to the mysql image file The mysqld image file can be found under the apps tree on each cluster. For example, on Grace: /vast/palmer/apps/apptainer/images/mysqld-5.7.21.simg We recommend that you make a link to it in your mysql directory: ln -s /vast/palmer/apps/apptainer/images/mysqld-5.7.21.simg mysql.simg Step 5: Start the container. Note that this doesn't actually start the service yet. apptainer instance start --bind ${HOME} \\ --bind ${PWD}/mysql/var/lib/mysql/:/var/lib/mysql \\ --bind ${PWD}/mysql/run/mysqld:/run/mysqld \\ ./mysql.simg mysql To check that it is running: apptainer instance list Step 6: Start the mysqld server within the container apptainer run instance://mysql You'll see lots of output, but at the end you should see a message like this 2022-02-21T17:16:21.104527Z 0 [Note] mysqld: ready for connections. Version: '5.7.21' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server (GPL) Step 7: Enter the running container apptainer exec instance://mysql /bin/bash Connect locally as root user while in the container, using the password you set in the config files in step 2. Singularity> mysql -u root -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \\g. Your MySQL connection id is 3 Server version: 5.7.21 MySQL Community Server (GPL) Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\\h' for help. Type '\\c' to clear the current input statement. mysql> Success! The server is working! Type exit to get out of mysql, but remain in the container: Step 8: Add a database user and permit it to login remotely Next, in order to connect from outside the container, you need to add a user that is allowed to connect remotely and give that user permissions. This is one way to do that from the container shell. You should probably substitute your name for elmerfudd and a better password for mypasswd! Singularity> mysql -e \"GRANT ALL PRIVILEGES ON *.* TO 'elmerfudd'@'%' IDENTIFIED BY 'mypasswd' WITH GRANT OPTION\" Singularity> mysql -e \"FLUSH PRIVILEGES\" Type exit to leave the container. From that compute node, but outside the container, try connecting with: mysql -u elmerfudd -h 127.0.0.1 -p Now try connecting to that server from a different compute node by using the hostname of the node where the server is running (e.g. c22n01) instead of 127.0.0.1 mysql -u elmerfudd -h c22n01 -p While connected, you can try actually using the server in the usual way to create a database and table: MySQL [(none)]> create database rob; Query OK, 1 row affected (0.00 sec) MySQL [(none)]> use rob Database changed MySQL [rob]> create table users (name VARCHAR(20), id INT); Query OK, 0 rows affected (0.11 sec) ... Success! You've earned a reward of your choice! Step 9 Shut the container down. apptainer instance stop mysql Now that everything is installed, the next time you want to start the server, you'll only need to do steps 5 (starting the container) and 6 (starting the mysql server). Note that you'll run into a problem if two mysql instances are run on the same compute node, since by default they each try to use port 3306. The simplest solution is to specify a non-standard port in your .my.cnf file: [mysqld] port=3310 innodb_use_native_aio=0 init-file=${HOME}/.mysqlrootpw [client] port=3310 user=root password='my-secret-pw'","title":"Mysql"},{"location":"clusters-at-yale/guides/mysql/#mysql","text":"Mysql is a popular relational database. Because a database is usually thought of as a persistent service, it is not ordinarily run on HPC clusters, since allocations on an HPC cluster are temporary. If you need a persistent mysql database server, we recommend either installing mysql on a server in your lab, or using ITS's Spinup service. In either case, the mysql server can be accessed remotely from the HPC clusters. However, there are some use cases for running a mysql server on the cluster that do make sense. For example, some applications store their data in a mysql database that only needs to run when the application runs. Most instructions for installing mysql involve creating a persistent server and require admin privileges. The instructions that follow walk you through the process of running a mysql server using Apptainer on a cluster compute node without any special privileges. It uses an Apptainer container developed by Robert Grandin at Iowa State (Thanks!) All of the following must be done on an allocated compute node. Do not do this on the login node!","title":"Mysql"},{"location":"clusters-at-yale/guides/mysql/#step-1-create-an-installation-directory-somewhere-and-cd-to-it","text":"mkdir ~/project/mysql cd ~/project/mysql","title":"Step 1: Create an installation directory somewhere, and cd to it"},{"location":"clusters-at-yale/guides/mysql/#step-2-create-two-config-files","text":"Put the following in ~/.my.cnf. Note that you should change the password in both files to something else. [mysqld] innodb_use_native_aio=0 init-file=${HOME}/.mysqlrootpw [client] user=root password='my-secret-pw' Put the following in ~/.mysqlrootpw SET PASSWORD FOR 'root'@'localhost' = PASSWORD('my-secret-pw');","title":"Step 2: Create two config files"},{"location":"clusters-at-yale/guides/mysql/#step-3-create-data-directories-for-mysql","text":"mkdir -p ${PWD}/mysql/var/lib/mysql ${PWD}/mysql/run/mysqld","title":"Step 3: Create data directories for mysql"},{"location":"clusters-at-yale/guides/mysql/#step-4-make-a-link-to-the-mysql-image-file","text":"The mysqld image file can be found under the apps tree on each cluster. For example, on Grace: /vast/palmer/apps/apptainer/images/mysqld-5.7.21.simg We recommend that you make a link to it in your mysql directory: ln -s /vast/palmer/apps/apptainer/images/mysqld-5.7.21.simg mysql.simg","title":"Step 4: Make a link to the mysql image file"},{"location":"clusters-at-yale/guides/mysql/#step-5-start-the-container-note-that-this-doesnt-actually-start-the-service-yet","text":"apptainer instance start --bind ${HOME} \\ --bind ${PWD}/mysql/var/lib/mysql/:/var/lib/mysql \\ --bind ${PWD}/mysql/run/mysqld:/run/mysqld \\ ./mysql.simg mysql To check that it is running: apptainer instance list","title":"Step 5: Start the container. Note that this doesn't actually start the service yet."},{"location":"clusters-at-yale/guides/mysql/#step-6-start-the-mysqld-server-within-the-container","text":"apptainer run instance://mysql You'll see lots of output, but at the end you should see a message like this 2022-02-21T17:16:21.104527Z 0 [Note] mysqld: ready for connections. Version: '5.7.21' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server (GPL)","title":"Step 6: Start the mysqld server within the container"},{"location":"clusters-at-yale/guides/mysql/#step-7-enter-the-running-container","text":"apptainer exec instance://mysql /bin/bash Connect locally as root user while in the container, using the password you set in the config files in step 2. Singularity> mysql -u root -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \\g. Your MySQL connection id is 3 Server version: 5.7.21 MySQL Community Server (GPL) Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\\h' for help. Type '\\c' to clear the current input statement. mysql> Success! The server is working! Type exit to get out of mysql, but remain in the container:","title":"Step 7: Enter the running container"},{"location":"clusters-at-yale/guides/mysql/#step-8-add-a-database-user-and-permit-it-to-login-remotely","text":"Next, in order to connect from outside the container, you need to add a user that is allowed to connect remotely and give that user permissions. This is one way to do that from the container shell. You should probably substitute your name for elmerfudd and a better password for mypasswd! Singularity> mysql -e \"GRANT ALL PRIVILEGES ON *.* TO 'elmerfudd'@'%' IDENTIFIED BY 'mypasswd' WITH GRANT OPTION\" Singularity> mysql -e \"FLUSH PRIVILEGES\" Type exit to leave the container. From that compute node, but outside the container, try connecting with: mysql -u elmerfudd -h 127.0.0.1 -p Now try connecting to that server from a different compute node by using the hostname of the node where the server is running (e.g. c22n01) instead of 127.0.0.1 mysql -u elmerfudd -h c22n01 -p While connected, you can try actually using the server in the usual way to create a database and table: MySQL [(none)]> create database rob; Query OK, 1 row affected (0.00 sec) MySQL [(none)]> use rob Database changed MySQL [rob]> create table users (name VARCHAR(20), id INT); Query OK, 0 rows affected (0.11 sec) ... Success! You've earned a reward of your choice!","title":"Step 8: Add a database user and permit it to login remotely"},{"location":"clusters-at-yale/guides/mysql/#step-9-shut-the-container-down","text":"apptainer instance stop mysql Now that everything is installed, the next time you want to start the server, you'll only need to do steps 5 (starting the container) and 6 (starting the mysql server). Note that you'll run into a problem if two mysql instances are run on the same compute node, since by default they each try to use port 3306. The simplest solution is to specify a non-standard port in your .my.cnf file: [mysqld] port=3310 innodb_use_native_aio=0 init-file=${HOME}/.mysqlrootpw [client] port=3310 user=root password='my-secret-pw'","title":"Step 9 Shut the container down."},{"location":"clusters-at-yale/guides/namd/","text":"NAMD NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD scales to hundreds of cores for typical simulations. NAMD uses the popular molecular graphics program VMD , for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR.To see a full list of available versions of NAMD on the cluster, run: module avail namd/ As of this writing, the latest installed version is 2.13. Running NAMD on the Cluster To set up NAMD on the cluster, module load NAMD/2.13-multicore for the standard multicore version, or module load NAMD/2.13-multicore-CUDA for the GPU-enabled version (about which there is more information below). NAMD can be run interactively, or as a batch job. To run NAMD interactively, you need to create an interactive session on a compute node. You could start an interactive session using 4 cores for 4 hours using salloc --x11 -c 4 -t 4 :00:00 For longer simulations, you will generally want to run non-interactively via a batch job . Parallelization NAMD is most effective when run with parallelization. For running on a single node, namd2 +p ${ SLURM_CPUS_PER_TASK } YourConfigfile where ${SLURM_CPUS_PER_TASK} is set by your \"-c\" job resource request. NAMD uses charm++ parallel objects for multinode parallelization and the program launch uses the charmrun interface. Setting up a multinode run in a way that provides improved performance can be a complicated undertaking. If you wish to run a multinode NAMD job and are not already familiar with MPI, feel free to contact the YCRC staff for assistance. GPUs To use the GPU-accelerated version, request GPU resources for your SLURM job using salloc or via a submission script, and load a CUDA-enabled version of NAMD: module load NAMD/2.13-multicore-CUDA For a single-node run, you will need at least one thread for each GPU you want to use: #SBATCH -c 4 --gpus=4 ... charmrun ++local namd2 +p ${ SLURM_CPUS_PER_TASK } YourConfigfile","title":"NAMD"},{"location":"clusters-at-yale/guides/namd/#namd","text":"NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD scales to hundreds of cores for typical simulations. NAMD uses the popular molecular graphics program VMD , for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR.To see a full list of available versions of NAMD on the cluster, run: module avail namd/ As of this writing, the latest installed version is 2.13.","title":"NAMD"},{"location":"clusters-at-yale/guides/namd/#running-namd-on-the-cluster","text":"To set up NAMD on the cluster, module load NAMD/2.13-multicore for the standard multicore version, or module load NAMD/2.13-multicore-CUDA for the GPU-enabled version (about which there is more information below). NAMD can be run interactively, or as a batch job. To run NAMD interactively, you need to create an interactive session on a compute node. You could start an interactive session using 4 cores for 4 hours using salloc --x11 -c 4 -t 4 :00:00 For longer simulations, you will generally want to run non-interactively via a batch job .","title":"Running NAMD on the Cluster"},{"location":"clusters-at-yale/guides/namd/#parallelization","text":"NAMD is most effective when run with parallelization. For running on a single node, namd2 +p ${ SLURM_CPUS_PER_TASK } YourConfigfile where ${SLURM_CPUS_PER_TASK} is set by your \"-c\" job resource request. NAMD uses charm++ parallel objects for multinode parallelization and the program launch uses the charmrun interface. Setting up a multinode run in a way that provides improved performance can be a complicated undertaking. If you wish to run a multinode NAMD job and are not already familiar with MPI, feel free to contact the YCRC staff for assistance.","title":"Parallelization"},{"location":"clusters-at-yale/guides/namd/#gpus","text":"To use the GPU-accelerated version, request GPU resources for your SLURM job using salloc or via a submission script, and load a CUDA-enabled version of NAMD: module load NAMD/2.13-multicore-CUDA For a single-node run, you will need at least one thread for each GPU you want to use: #SBATCH -c 4 --gpus=4 ... charmrun ++local namd2 +p ${ SLURM_CPUS_PER_TASK } YourConfigfile","title":"GPUs"},{"location":"clusters-at-yale/guides/parallel/","text":"Parallel GNU Parallel a simple but powerful way to run independent tasks in parallel. Although it is possible to run on multiple nodes, it is simplest to run on multiple cpus of a single node, and that is what we will consider here. Note that what is presented here just scratches the surface of what parallel can do. Basic Examples Loop Let's parallelize the following bash loop that prints the letters a through f using bash's brace expansion : for letter in { a..f } ; do echo $letter done ... which produces the following output: a b c d e f To achieve the same result, parallel starts some number of workers and then runs tasks on them. The number of workers and tasks need not be the same. You specify the number of workers with -j . The tasks can be generated with a list of arguments specified after the separator ::: . For parallel to perform well, you should allocate at least the same number of CPUs as workers with the slurm option --cpus-per-task or more simply -c . salloc -c 4 module load parallel parallel -j 4 \"echo {}\" ::: { a..f } This runs four workers that each run echo , filling in the argument {} with the next item in the list. This produces the output: Nested Loop Let's parallelize the following nested bash loop. for letter in { a..c } do for number in { 1 ..7..2 } do echo $letter $number done done ... which produces the following output: a 1 a 2 a 3 b 1 b 2 b 3 c 1 c 2 c 3 You can use the ::: separator with parallel to specify multiple lists of parameters you would like to iterate over. Then you can refer to them by one-based index, e.g. list one is {1} . Using these, you can ask parallel to execute combinations of parameters. Here is a way to recreate the result of the serial bash loop above: parallel -j 4 \"echo {1} {2}\" ::: { a..c } ::: { 1 ..3 } Advanced Examples md5sum You have a number of files scattered throughout a directory tree. Their names end with fastq.gz, e.g. d1/d3/sample3.fastq.gz. You'd like to run md5sum on each, and put the output in a file in the same directory, with a filename ending with .md5sum, e.g. d1/d3/sample3.md5sum. Here is a script that will do that in parallel, using 16 cpus on one node of the cluster: #!/bin/bash #SBATCH -c 16 module load parallel parallel -j ${ SLURM_CPUS_PER_TASK } --plus \"echo {}; md5sum {} > {/fastq.gz/md5sum.new}\" ::: $( find . -name \"*.fastq.gz\" -print ) The $(find . -name \"*.fastq.gz\" -print) portion of the command returns all of the files of interest. They will be plugged into the {} in the md5sum command. {/fastq.gz/md5sum.new} does a string replacement on the filename, producing the desired output filename. String replacement requires the --plus flag to parallel, which enables a number of powerful string manipulation features. Finally, we pass -j ${SLURM_CPUS_PER_TASK} so that parallel will use all of the allocated cpus, however many there are. Parameter Sweep You want to run a simulation program that takes a number of input parameters, and you want to sample a variety of values for each parameter. #!/bin/bash #SBATCH -c 16 module load parallel parallel -j ${ SLURM_CPUS_PER_TASK } simulate { 1 } { 2 } { 3 } ::: { 1 ..5 } ::: 2 16 ::: { 5 ..50..5 } This will run 100 jobs, each with parameters that vary as : simulate 1 2 5 simulate 1 2 10 simulate 1 2 15 ... simulate 5 16 45 simulate 5 16 50 If simulate doesn't create unique output based on parameters, you can use redirection so you can review results from each task. You'll need to use quotes so that the > is seen as part of the command: parallel -j ${ SLURM_CPUS_PER_TASK } \"simulate {1} {2} {3} > results_{1}_{2}_{3}.out\" ::: $( seq 1 5 ) ::: 2 16 ::: $( seq 5 5 50 )","title":"Parallel"},{"location":"clusters-at-yale/guides/parallel/#parallel","text":"GNU Parallel a simple but powerful way to run independent tasks in parallel. Although it is possible to run on multiple nodes, it is simplest to run on multiple cpus of a single node, and that is what we will consider here. Note that what is presented here just scratches the surface of what parallel can do.","title":"Parallel"},{"location":"clusters-at-yale/guides/parallel/#basic-examples","text":"","title":"Basic Examples"},{"location":"clusters-at-yale/guides/parallel/#loop","text":"Let's parallelize the following bash loop that prints the letters a through f using bash's brace expansion : for letter in { a..f } ; do echo $letter done ... which produces the following output: a b c d e f To achieve the same result, parallel starts some number of workers and then runs tasks on them. The number of workers and tasks need not be the same. You specify the number of workers with -j . The tasks can be generated with a list of arguments specified after the separator ::: . For parallel to perform well, you should allocate at least the same number of CPUs as workers with the slurm option --cpus-per-task or more simply -c . salloc -c 4 module load parallel parallel -j 4 \"echo {}\" ::: { a..f } This runs four workers that each run echo , filling in the argument {} with the next item in the list. This produces the output:","title":"Loop"},{"location":"clusters-at-yale/guides/parallel/#nested-loop","text":"Let's parallelize the following nested bash loop. for letter in { a..c } do for number in { 1 ..7..2 } do echo $letter $number done done ... which produces the following output: a 1 a 2 a 3 b 1 b 2 b 3 c 1 c 2 c 3 You can use the ::: separator with parallel to specify multiple lists of parameters you would like to iterate over. Then you can refer to them by one-based index, e.g. list one is {1} . Using these, you can ask parallel to execute combinations of parameters. Here is a way to recreate the result of the serial bash loop above: parallel -j 4 \"echo {1} {2}\" ::: { a..c } ::: { 1 ..3 }","title":"Nested Loop"},{"location":"clusters-at-yale/guides/parallel/#advanced-examples","text":"","title":"Advanced Examples"},{"location":"clusters-at-yale/guides/parallel/#md5sum","text":"You have a number of files scattered throughout a directory tree. Their names end with fastq.gz, e.g. d1/d3/sample3.fastq.gz. You'd like to run md5sum on each, and put the output in a file in the same directory, with a filename ending with .md5sum, e.g. d1/d3/sample3.md5sum. Here is a script that will do that in parallel, using 16 cpus on one node of the cluster: #!/bin/bash #SBATCH -c 16 module load parallel parallel -j ${ SLURM_CPUS_PER_TASK } --plus \"echo {}; md5sum {} > {/fastq.gz/md5sum.new}\" ::: $( find . -name \"*.fastq.gz\" -print ) The $(find . -name \"*.fastq.gz\" -print) portion of the command returns all of the files of interest. They will be plugged into the {} in the md5sum command. {/fastq.gz/md5sum.new} does a string replacement on the filename, producing the desired output filename. String replacement requires the --plus flag to parallel, which enables a number of powerful string manipulation features. Finally, we pass -j ${SLURM_CPUS_PER_TASK} so that parallel will use all of the allocated cpus, however many there are.","title":"md5sum"},{"location":"clusters-at-yale/guides/parallel/#parameter-sweep","text":"You want to run a simulation program that takes a number of input parameters, and you want to sample a variety of values for each parameter. #!/bin/bash #SBATCH -c 16 module load parallel parallel -j ${ SLURM_CPUS_PER_TASK } simulate { 1 } { 2 } { 3 } ::: { 1 ..5 } ::: 2 16 ::: { 5 ..50..5 } This will run 100 jobs, each with parameters that vary as : simulate 1 2 5 simulate 1 2 10 simulate 1 2 15 ... simulate 5 16 45 simulate 5 16 50 If simulate doesn't create unique output based on parameters, you can use redirection so you can review results from each task. You'll need to use quotes so that the > is seen as part of the command: parallel -j ${ SLURM_CPUS_PER_TASK } \"simulate {1} {2} {3} > results_{1}_{2}_{3}.out\" ::: $( seq 1 5 ) ::: 2 16 ::: $( seq 5 5 50 )","title":"Parameter Sweep"},{"location":"clusters-at-yale/guides/python/","text":"Python Python is a language and free software distribution that is used for websites, system administration, security testing, and scientific computing, to name a few. On the Yale Clusters there are a couple ways in which you can set up Python environments. The default python provided is the minimal install of Python 2.7 that comes with Red Hat Enterprise Linux 7. We strongly recommend that you use one of the methods below to set up your own python environment. The Python Module We provide a Python as a software module . We include frozen versions of many common packages used for scientific computing. Find and Load Python Find the available versions of Python version 3 with: module avail Python/3 To load version 3.7.0: module load Python/3.7.0-foss-2018b To show installed Python packages and their versions for the Python/3.7.0-foss-2018b module: module help Python/3.7.0-foss-2018b Install Packages We recommend against installing python packages with pip after having loaded the Python module. Doing so installs them to your home directory in a way that does not make it clear to other python installs what environment the packages you installed belong to. Instead we recommend using virtualenv or Conda environments. We like conda because of all the additional pre-compiled software it makes available. Warning Grace's login nodes have newer architecture than the oldest nodes on the cluster. If you do pip install packages, do so in an interactive job submitted with the -C oldest Slurm flag if you want to ensure your code will work on all generations of the compute nodes. Conda-based Python Environments You can easily set up multiple Python installations side-by-side using the conda command. With Conda you can manage your own packages and dependencies for Python, R, etc. See our guide for more detailed instructions. # install once module load miniconda conda create -n py3_env python = 3 numpy scipy matplotlib ipython jupyter jupyterlab # use later module purge && module load miniconda conda activate py3_env Run Python We will kill Python jobs on the login nodes that are using excessive resources. To be a good cluster citizen, launch your computation in jobs. See our Slurm documentation for more detailed information on submitting jobs. Interactive Job To run Python interactively, first launch an interactive job on a compute node. If your Python sessions will need up to 10 GiB of RAM and up to 4 hours, you would submit you job with: salloc --mem = 10G -t 4 :00:00 Once your interactive session starts, you can load the appropriate module or Conda environment (see above) and start python or ipython on your command prompt. If you are happy with your Python commands, save them to a file which can then be submitted and run as a batch job. Batch Mode To run Python in batch mode, create a plain-text batch script to submit. In that script, you call your Python script. In this case myscript.py is in the same directory as the batch script, batch script contents shown below. #!/bin/bash #SBATCH -J my_python_program #SBATCH --mem=10G #SBATCH -t 4:00:00 module load miniconda conda activate py3_env python myscript.py To actually submit the job, run sbatch my_py_job.sh where the batch script above was saved as my_py_job.sh . Jupyter Notebooks You can run Jupyter notebooks & JupyterLab by submitting your notebook server as a job. See our page dedicated to Jupyter for more info.","title":"Python"},{"location":"clusters-at-yale/guides/python/#python","text":"Python is a language and free software distribution that is used for websites, system administration, security testing, and scientific computing, to name a few. On the Yale Clusters there are a couple ways in which you can set up Python environments. The default python provided is the minimal install of Python 2.7 that comes with Red Hat Enterprise Linux 7. We strongly recommend that you use one of the methods below to set up your own python environment.","title":"Python"},{"location":"clusters-at-yale/guides/python/#the-python-module","text":"We provide a Python as a software module . We include frozen versions of many common packages used for scientific computing.","title":"The Python Module"},{"location":"clusters-at-yale/guides/python/#find-and-load-python","text":"Find the available versions of Python version 3 with: module avail Python/3 To load version 3.7.0: module load Python/3.7.0-foss-2018b To show installed Python packages and their versions for the Python/3.7.0-foss-2018b module: module help Python/3.7.0-foss-2018b","title":"Find and Load Python"},{"location":"clusters-at-yale/guides/python/#install-packages","text":"We recommend against installing python packages with pip after having loaded the Python module. Doing so installs them to your home directory in a way that does not make it clear to other python installs what environment the packages you installed belong to. Instead we recommend using virtualenv or Conda environments. We like conda because of all the additional pre-compiled software it makes available. Warning Grace's login nodes have newer architecture than the oldest nodes on the cluster. If you do pip install packages, do so in an interactive job submitted with the -C oldest Slurm flag if you want to ensure your code will work on all generations of the compute nodes.","title":"Install Packages"},{"location":"clusters-at-yale/guides/python/#conda-based-python-environments","text":"You can easily set up multiple Python installations side-by-side using the conda command. With Conda you can manage your own packages and dependencies for Python, R, etc. See our guide for more detailed instructions. # install once module load miniconda conda create -n py3_env python = 3 numpy scipy matplotlib ipython jupyter jupyterlab # use later module purge && module load miniconda conda activate py3_env","title":"Conda-based Python Environments"},{"location":"clusters-at-yale/guides/python/#run-python","text":"We will kill Python jobs on the login nodes that are using excessive resources. To be a good cluster citizen, launch your computation in jobs. See our Slurm documentation for more detailed information on submitting jobs.","title":"Run Python"},{"location":"clusters-at-yale/guides/python/#interactive-job","text":"To run Python interactively, first launch an interactive job on a compute node. If your Python sessions will need up to 10 GiB of RAM and up to 4 hours, you would submit you job with: salloc --mem = 10G -t 4 :00:00 Once your interactive session starts, you can load the appropriate module or Conda environment (see above) and start python or ipython on your command prompt. If you are happy with your Python commands, save them to a file which can then be submitted and run as a batch job.","title":"Interactive Job"},{"location":"clusters-at-yale/guides/python/#batch-mode","text":"To run Python in batch mode, create a plain-text batch script to submit. In that script, you call your Python script. In this case myscript.py is in the same directory as the batch script, batch script contents shown below. #!/bin/bash #SBATCH -J my_python_program #SBATCH --mem=10G #SBATCH -t 4:00:00 module load miniconda conda activate py3_env python myscript.py To actually submit the job, run sbatch my_py_job.sh where the batch script above was saved as my_py_job.sh .","title":"Batch Mode"},{"location":"clusters-at-yale/guides/python/#jupyter-notebooks","text":"You can run Jupyter notebooks & JupyterLab by submitting your notebook server as a job. See our page dedicated to Jupyter for more info.","title":"Jupyter Notebooks"},{"location":"clusters-at-yale/guides/r/","text":"R R is a free software environment for statistical computing and graphics. On the Yale Clusters there are a couple ways in which you can set up your R environment. There is no R executable provided by default; you have to choose one of the following methods to be able to run R. The R Module We provide several versions of R as software modules . These modules provide a broad selection of commonly used packages pre-installed. Notably, this includes a number of geospatial packages like sf , sp , raster , and terra . In addition, we install a collection of the most common bioconductor bioinformatics packages ( homepage ) called R-bundle-Bioconductor . This can be loaded in addition to the matching R module to provide simple access to these tools. Find and Load R Find the available versions of R version 4 with: module avail R/4 To load version 4.2.0: module load R/4.2.0-foss-2020b To show installed R packages and their versions for the R/4.2.0 module: module help R/4.2.0-foss-2020b Between the base R module and the R-bundle-Bioconductor module, there are over 1000 R packages installed and ready to use. To find if your desired package is available in these modules, you can run module spider $PACKAGE/$VERSION : module spider Seurat/4.1.1 -------------------------------------------------------------------------------------------------------------------------------------------------------- Seurat: Seurat/4.1.1 ( E ) -------------------------------------------------------------------------------------------------------------------------------------------------------- This extension is provided by the following modules. To access the extension you must load one of the following modules. Note that any module names in parentheses show the module location in the software hierarchy. R-bundle-Bioconductor/3.15-foss-2020b-R-4.2.0 Names marked by a trailing ( E ) are extensions provided by another module. So to get this version of Seurat, you can load the R-bundle-Bioconductor module. Then you simple library(Seurat) to use that tool. Install Packages The software modules include many commonly used packages, but you can install additional packages specifically for your account. As part of the R software modules we define an environment variable which directs R to install packages to your project space. This helps prevent issues where R cannot install packages due to home-space quotas. To change the location of where R installs packages, the R_LIBS_USER variable can be set in your ~/.bashrc file: export R_LIBS_USER=$GIBBS_PROJECT/R/%v where %v is a placeholder for the R major and minor version number (e.g. 4.2 ). R will replace this variable with the correct value automatically to segregate packages installed with different versions of R. We recommend you install packages in an interactive job with the slurm option -C oldest . This will ensure the compiled portions of your R library are compatible with all compute nodes on the cluster. If there is a missing library your package of interest needs you should be able to load its module. If you cannot find a dependency or have trouble installing an R package, please get in touch with us . Warning Grace's login nodes have newer architecture than the oldest nodes on the cluster. Always install packages in an interactive job submitted with the -C oldest Slurm flag if you want to ensure your code will work on all generations of the compute nodes. To get started load the R module and start R: salloc module load R/4.2.0-foss-2020b R # in R > install.packages ( \"lattice\" , repos = \"http://cran.r-project.org\" ) This will throw a warning like: Warning in install.packages ( \"lattice\" ) : 'lib = \"/ysm-gpfs/apps/software/R/4.2.0-foss-2020b/lib64/R/library\"' is not writable Would you like to create a personal library /gpfs/gibbs/project/support/tl397/R/4.1 to install packages into? ( y/n ) Note If you encounter a permission error because the installation does not prompt you to create a personal library, create the directory in the default location in your home directory for the version of R you are using; e.g., mkdir /path/to/your/project/space/R/4.2 You only need the general minor version such as 4.2 instead of 4.2.2. You can customize where packages are installed and accessed for a particular R session using the .libPaths function in R: # List current package locations > .libPaths() # Add new default location to the standard defaults, e.g. project/my_R_libs > .libPaths(c(\"/home/netID/project/my_R_libs/\", .libPaths())) Run R We will kill R jobs on the login nodes that are using excessive resources. To be a good cluster citizen, launch your R computation in jobs. See our Slurm documentation for more detailed information on submitting jobs. Interactive Job To run R interactively, first launch an interactive job on a compute node. If your R sessions will need up to 10 GiB of RAM and up to 4 hours, you would submit you job with: salloc --mem = 10G -t 4 :00:00 Once your interactive session starts, you can load the appropriate module or Conda environment (see above) and start R by entering R on your command prompt. If you are happy with your R commands, save them to a file which can then be submitted and run as a batch job. Batch Mode To run R in batch mode, create a plain-text batch script to submit. In that script, you can run your R script. In this case myscript.R is in the same directory as the batch script, batch script contents shown below. #!/bin/bash #SBATCH -J my_r_program #SBATCH --mem=10G #SBATCH -t 4:00:00 module load R/4.1.0-foss-2020b Rscript myscript.R To actually submit the job, run sbatch my_r_job.sh where the batch script above was saved as my_r_job.sh . RStudio You can run RStudio app via Open Ondemand . Here you can select the desired version of R and RStudio and launch an interactive compute session. Parallel R On a cluster you may want to use R in parallel across multiple nodes. While there are a few different ways this can be achieved, we recommend using the R software module which already includes Rmpi , parallel , and doMC . To test it, we can create a simple R script named ex1.R library ( \"Rmpi\" ) n <- mpi.comm.size ( 0 ) me <- mpi.comm.rank ( 0 ) mpi.barrier ( 0 ) val <- 777 mpi.bcast ( val , 1 , 0 , 0 ) print ( paste ( \"me\" , me , \"val\" , val )) mpi.barrier ( 0 ) mpi.quit () Then we can launch it with an sbatch script ( ex1.sh ): #!/bin/bash #SBATCH -n 4 #SBATCH -t 5:00 module purge module load R/4.1.0-foss-2020b srun Rscript ex1.R This script should execute a simple broadcast and complete in a few seconds. Virtual Display Session It is common for R to require a display session to save certain types of figures. You may see a warning like \"unable to start device PNG\" or \"unable to open connection to X11 display\". There is a tool, xvfb , which can help avoid these issues. The wrapper xvfb-run creates a virtual display session which allows R to create these figures without an X11 session. See the guide for xvfb for more details. Conda-based R Environments If there isn't a module available for the version of R you want, you can set up your own R installation using Conda . With Conda you can manage your own packages and dependencies, for R, Python, etc. Most of the time the best way to install R packages for your Conda R environment is via conda . # load miniconda module load miniconda # create the conda environment including r-base and r-essentials package collections conda create --name my_r_env r-base r-essentials # activate the environment conda activate my_r_env # Install the lattice package (r-lattice) conda install r-lattice If there are packages that conda does not provide, you can install using the install.packages function, but this may occasionally not work as well. When you install packages with install.packages make sure to activate your Conda environment first. salloc module load miniconda source activate my_r_env R # in R > install.packages ( \"lattice\" , repos = \"http://cran.r-project.org\" ) Warning Conda-based R may not work properly with parallel packages like Rmpi when running across multiple compute nodes. In general, it's best to use the module installation of R for anything which requires MPI.","title":"R"},{"location":"clusters-at-yale/guides/r/#r","text":"R is a free software environment for statistical computing and graphics. On the Yale Clusters there are a couple ways in which you can set up your R environment. There is no R executable provided by default; you have to choose one of the following methods to be able to run R.","title":"R"},{"location":"clusters-at-yale/guides/r/#the-r-module","text":"We provide several versions of R as software modules . These modules provide a broad selection of commonly used packages pre-installed. Notably, this includes a number of geospatial packages like sf , sp , raster , and terra . In addition, we install a collection of the most common bioconductor bioinformatics packages ( homepage ) called R-bundle-Bioconductor . This can be loaded in addition to the matching R module to provide simple access to these tools.","title":"The R Module"},{"location":"clusters-at-yale/guides/r/#find-and-load-r","text":"Find the available versions of R version 4 with: module avail R/4 To load version 4.2.0: module load R/4.2.0-foss-2020b To show installed R packages and their versions for the R/4.2.0 module: module help R/4.2.0-foss-2020b Between the base R module and the R-bundle-Bioconductor module, there are over 1000 R packages installed and ready to use. To find if your desired package is available in these modules, you can run module spider $PACKAGE/$VERSION : module spider Seurat/4.1.1 -------------------------------------------------------------------------------------------------------------------------------------------------------- Seurat: Seurat/4.1.1 ( E ) -------------------------------------------------------------------------------------------------------------------------------------------------------- This extension is provided by the following modules. To access the extension you must load one of the following modules. Note that any module names in parentheses show the module location in the software hierarchy. R-bundle-Bioconductor/3.15-foss-2020b-R-4.2.0 Names marked by a trailing ( E ) are extensions provided by another module. So to get this version of Seurat, you can load the R-bundle-Bioconductor module. Then you simple library(Seurat) to use that tool.","title":"Find and Load R"},{"location":"clusters-at-yale/guides/r/#install-packages","text":"The software modules include many commonly used packages, but you can install additional packages specifically for your account. As part of the R software modules we define an environment variable which directs R to install packages to your project space. This helps prevent issues where R cannot install packages due to home-space quotas. To change the location of where R installs packages, the R_LIBS_USER variable can be set in your ~/.bashrc file: export R_LIBS_USER=$GIBBS_PROJECT/R/%v where %v is a placeholder for the R major and minor version number (e.g. 4.2 ). R will replace this variable with the correct value automatically to segregate packages installed with different versions of R. We recommend you install packages in an interactive job with the slurm option -C oldest . This will ensure the compiled portions of your R library are compatible with all compute nodes on the cluster. If there is a missing library your package of interest needs you should be able to load its module. If you cannot find a dependency or have trouble installing an R package, please get in touch with us . Warning Grace's login nodes have newer architecture than the oldest nodes on the cluster. Always install packages in an interactive job submitted with the -C oldest Slurm flag if you want to ensure your code will work on all generations of the compute nodes. To get started load the R module and start R: salloc module load R/4.2.0-foss-2020b R # in R > install.packages ( \"lattice\" , repos = \"http://cran.r-project.org\" ) This will throw a warning like: Warning in install.packages ( \"lattice\" ) : 'lib = \"/ysm-gpfs/apps/software/R/4.2.0-foss-2020b/lib64/R/library\"' is not writable Would you like to create a personal library /gpfs/gibbs/project/support/tl397/R/4.1 to install packages into? ( y/n ) Note If you encounter a permission error because the installation does not prompt you to create a personal library, create the directory in the default location in your home directory for the version of R you are using; e.g., mkdir /path/to/your/project/space/R/4.2 You only need the general minor version such as 4.2 instead of 4.2.2. You can customize where packages are installed and accessed for a particular R session using the .libPaths function in R: # List current package locations > .libPaths() # Add new default location to the standard defaults, e.g. project/my_R_libs > .libPaths(c(\"/home/netID/project/my_R_libs/\", .libPaths()))","title":"Install Packages"},{"location":"clusters-at-yale/guides/r/#run-r","text":"We will kill R jobs on the login nodes that are using excessive resources. To be a good cluster citizen, launch your R computation in jobs. See our Slurm documentation for more detailed information on submitting jobs.","title":"Run R"},{"location":"clusters-at-yale/guides/r/#interactive-job","text":"To run R interactively, first launch an interactive job on a compute node. If your R sessions will need up to 10 GiB of RAM and up to 4 hours, you would submit you job with: salloc --mem = 10G -t 4 :00:00 Once your interactive session starts, you can load the appropriate module or Conda environment (see above) and start R by entering R on your command prompt. If you are happy with your R commands, save them to a file which can then be submitted and run as a batch job.","title":"Interactive Job"},{"location":"clusters-at-yale/guides/r/#batch-mode","text":"To run R in batch mode, create a plain-text batch script to submit. In that script, you can run your R script. In this case myscript.R is in the same directory as the batch script, batch script contents shown below. #!/bin/bash #SBATCH -J my_r_program #SBATCH --mem=10G #SBATCH -t 4:00:00 module load R/4.1.0-foss-2020b Rscript myscript.R To actually submit the job, run sbatch my_r_job.sh where the batch script above was saved as my_r_job.sh .","title":"Batch Mode"},{"location":"clusters-at-yale/guides/r/#rstudio","text":"You can run RStudio app via Open Ondemand . Here you can select the desired version of R and RStudio and launch an interactive compute session.","title":"RStudio"},{"location":"clusters-at-yale/guides/r/#parallel-r","text":"On a cluster you may want to use R in parallel across multiple nodes. While there are a few different ways this can be achieved, we recommend using the R software module which already includes Rmpi , parallel , and doMC . To test it, we can create a simple R script named ex1.R library ( \"Rmpi\" ) n <- mpi.comm.size ( 0 ) me <- mpi.comm.rank ( 0 ) mpi.barrier ( 0 ) val <- 777 mpi.bcast ( val , 1 , 0 , 0 ) print ( paste ( \"me\" , me , \"val\" , val )) mpi.barrier ( 0 ) mpi.quit () Then we can launch it with an sbatch script ( ex1.sh ): #!/bin/bash #SBATCH -n 4 #SBATCH -t 5:00 module purge module load R/4.1.0-foss-2020b srun Rscript ex1.R This script should execute a simple broadcast and complete in a few seconds.","title":"Parallel R"},{"location":"clusters-at-yale/guides/r/#virtual-display-session","text":"It is common for R to require a display session to save certain types of figures. You may see a warning like \"unable to start device PNG\" or \"unable to open connection to X11 display\". There is a tool, xvfb , which can help avoid these issues. The wrapper xvfb-run creates a virtual display session which allows R to create these figures without an X11 session. See the guide for xvfb for more details.","title":"Virtual Display Session"},{"location":"clusters-at-yale/guides/r/#conda-based-r-environments","text":"If there isn't a module available for the version of R you want, you can set up your own R installation using Conda . With Conda you can manage your own packages and dependencies, for R, Python, etc. Most of the time the best way to install R packages for your Conda R environment is via conda . # load miniconda module load miniconda # create the conda environment including r-base and r-essentials package collections conda create --name my_r_env r-base r-essentials # activate the environment conda activate my_r_env # Install the lattice package (r-lattice) conda install r-lattice If there are packages that conda does not provide, you can install using the install.packages function, but this may occasionally not work as well. When you install packages with install.packages make sure to activate your Conda environment first. salloc module load miniconda source activate my_r_env R # in R > install.packages ( \"lattice\" , repos = \"http://cran.r-project.org\" ) Warning Conda-based R may not work properly with parallel packages like Rmpi when running across multiple compute nodes. In general, it's best to use the module installation of R for anything which requires MPI.","title":"Conda-based R Environments"},{"location":"clusters-at-yale/guides/rclone/","text":"Rclone rclone is a command line tool to sync files and directories to and from all major cloud storage sites. You can use rclone to sync files and directories between Yale clusters and Yale Box, google drive, etc. The following instructions cover basics to setup and use rclone on Yale clusters. For more information about Rclone, please visit its website at https://rclone.org . Set up Rclone on YCRC clusters Before accessing a remote cloud storage using rclone , you need to run rclone config to configure the storage for rclone . Since rclone config will try to bring up a browser for you to authorize the cloud storage, we recommend you to use Open OnDemand . To run rclone config on OOD, first click Remote Desktop from the OOD dashboard. Once a session starts running, click Connect to Remote Desktop and you will see a terminal on the desktop in the browser. Run rclone config at the command line of the terminal. During configuration, you will see a message similar to the following: If your browser does not open automatically go to the following link: http://127.0.0.1:53682/auth Log in and authorize rclone for access Waiting for code... If no browser started automatically, then start Firefox manually by clicking the Firefox icon on the top bar of the Remote Desktop. Copy the link from the message shown on your screen and paste it to the address bar of Firefox. Log in with your Yale email address, respond to the DUO request, and authorize the access. Tip If you received an error stating that your session has expired for DUO, simply paste the link and reload the page. If you still get the expired message, log out of CAS in your browser by going to https://secure.its.yale.edu/cas/logout. After logging out, paste the link and reload. Examples The following examples show you how to set up rclone for a viriety of different storage types. In the examples, we name our remote cloud storage as 'remote' in the configuration. You can provide any name you want. Google Drive The example below is a screen dump when setting up rclone for Google Drive. [ pl543@c03n06 ~ ] $ rclone config No remotes found - make a new one n ) New remote s ) Set configuration password q ) Quit config n/s/q> n name> remote Type of storage to configure. Enter a string value. Press Enter for the default ( \"\" ) . Choose a number from below, or type in your own value 1 / 1Fichier \\ \"fichier\" 2 / Alias for an existing remote \\ \"alias\" [ ... ] 15 / Google Drive \\ \"drive\" [ ... ] 42 / seafile \\ \"seafile\" Storage> 15 ** See help for drive backend at: https://rclone.org/drive/ ** Google Application Client Id Setting your own is recommended. See https://rclone.org/drive/#making-your-own-client-id for how to create your own. If you leave this blank, it will use an internal key which is low performance. Enter a string value. Press Enter for the default ( \"\" ) . client_id> OAuth Client Secret Leave blank normally. Enter a string value. Press Enter for the default ( \"\" ) . client_secret> Scope that rclone should use when requesting access from drive. Enter a string value. Press Enter for the default ( \"\" ) . Choose a number from below, or type in your own value 1 / Full access all files, excluding Application Data Folder. \\ \"drive\" 2 / Read-only access to file metadata and file contents. \\ \"drive.readonly\" / Access to files created by rclone only. 3 | These are visible in the drive website. | File authorization is revoked when the user deauthorizes the app. \\ \"drive.file\" / Allows read and write access to the Application Data folder. 4 | This is not visible in the drive website. \\ \"drive.appfolder\" / Allows read-only access to file metadata but 5 | does not allow any access to read or download file content. \\ \"drive.metadata.readonly\" scope> 1 ID of the root folder Leave blank normally. Fill in to access \"Computers\" folders ( see docs ) , or for rclone to use a non root folder as its starting point. Enter a string value. Press Enter for the default ( \"\" ) . root_folder_id> Service Account Credentials JSON file path Leave blank normally. Needed only if you want use SA instead of interactive login. Leading ` ~ ` will be expanded in the file name as will environment variables such as ` ${ RCLONE_CONFIG_DIR } ` . Enter a string value. Press Enter for the default ( \"\" ) . service_account_file> Edit advanced config? ( y/n ) y ) Yes n ) No ( default ) y/n> n Remote config Use auto config? * Say Y if not sure * Say N if you are working on a remote or headless machine y ) Yes ( default ) n ) No y/n> y If your browser doesn ' t open automatically go to the following link: http://127.0.0.1:53682/auth?state = 6glRr_mpEORxHevlOaaYyw Log in and authorize rclone for access Waiting for code... Got code Configure this as a Shared Drive ( Team Drive ) ? y ) Yes n ) No ( default ) y/n> n -------------------- [ remote ] type = drive scope = drive token = { \"access_token\" : \"ya29.A0ArdaM-mBYFKBE2gieODvdANCZRV6Y8QHhQF-lY74E9fr1HTLOwwLRuoQQbO9P-Jdip62YYhqXfcuWT0KLKGdhUb9M8g2Z4XEQqoNLwZyA-FA2AAYYBqB\" , \"token_type\" : \"Bearer\" , \"refresh_token\" : \"1//0dDu3r6KVakgYIARAAGA0NwF-L9IrWIuG7_f44x-uLR2vvBocf4q-KnQVhlkm18TO2Fn0GjJp-cArWfj5kY84\" , \"expiry\" : \"2021-02-25T17:28:18.629507046-05 :00\" } -------------------- y ) Yes this is OK ( default ) e ) Edit this remote d ) Delete this remote y/e/d> y Current remotes: Name Type ==== ==== remote drive e ) Edit existing remote n ) New remote d ) Delete remote r ) Rename remote c ) Copy remote s ) Set configuration password q ) Quit config e/n/d/r/c/s/q> q Box The example below is a screen dump when setting up rclone for Yale Box. [ pl543@c14n07 ~ ] $ rclone config No remotes found - make a new one n ) New remote s ) Set configuration password q ) Quit config n/s/q> n name> remote Type of storage to configure. Enter a string value. Press Enter for the default ( \"\" ) . Choose a number from below, or type in your own value 1 / 1Fichier \\ \"fichier\" [ ... ] 6 / Box \\ \"box\" [ ... ] Storage> box ** See help for box backend at: https://rclone.org/box/ ** Box App Client Id. Leave blank normally. Enter a string value. Press Enter for the default ( \"\" ) . client_id> Box App Client Secret Leave blank normally. Enter a string value. Press Enter for the default ( \"\" ) . client_secret> Edit advanced config? ( y/n ) y ) Yes n ) No y/n> n Remote config Use auto config? * Say Y if not sure * Say N if you are working on a remote or headless machine y ) Yes n ) No y/n> y If your browser does not open automatically go to the following link: http://127.0.0.1:53682/auth Log in and authorize rclone for access Waiting for code... Got code -------------------- [ remote ] type = box token = { \"access_token\" : \"PjIXHUZ34VQSmeUZ9r6bhc9ux44KMU0e\" , \"token_type\" : \"bearer\" , \"refresh_token\" : \"VumWPWP5Nd0M2C1GyfgfJL51zUeWPPVLc6VC6lBQduEPsQ9a6ibSor2dvHmyZ6B8\" , \"expiry\" : \"2019-10-21T11:00:36.839586736-04:00\" } -------------------- y ) Yes this is OK e ) Edit this remote d ) Delete this remote y/e/d> y Current remotes: Name Type ==== ==== remote box e ) Edit existing remote n ) New remote d ) Delete remote r ) Rename remote c ) Copy remote s ) Set configuration password q ) Quit config e/n/d/r/c/s/q> q S3 The example below is a screen dump when setting up rclone for an S3 provider such as aws. [ rdb9@login1.mccleary ~ ] $ rclone config Enter configuration password: password: Current remotes: Name Type ==== ==== [ ... ] e ) Edit existing remote n ) New remote d ) Delete remote r ) Rename remote c ) Copy remote s ) Set configuration password q ) Quit config e/n/d/r/c/s/q> n ``` bash Enter name for new remote. name> remote Option Storage. Type of storage to configure. Choose a number from below, or type in your own value. [ ... ] 5 / Amazon S3 Compliant Storage Providers including AWS, Alibaba, Ceph, China Mobile, Cloudflare, ArvanCloud, DigitalOcean, Dreamhost, Huawei OBS, IBM COS, IDrive e2, IONOS Cloud, Liara, Lyve Cloud, Minio, Netease, RackCorp, Scaleway, SeaweedFS, StackPath, Storj, Tencent COS, Qiniu and Wasabi \\ ( s3 ) [ ... ] Storage> 5 Option provider. Choose your S3 provider. Choose a number from below, or type in your own value. Press Enter to leave empty. 1 / Amazon Web Services ( AWS ) S3 \\ ( AWS ) [ ... ] provider> 1 Option env_auth. Get AWS credentials from runtime ( environment variables or EC2/ECS meta data if no env vars ) . Only applies if access_key_id and secret_access_key is blank. Choose a number from below, or type in your own boolean value ( true or false ) . Press Enter for the default ( false ) . 1 / Enter AWS credentials in the next step. \\ ( false ) 2 / Get AWS credentials from the environment ( env vars or IAM ) . \\ ( true ) env_auth> Option access_key_id. AWS Access Key ID. Leave blank for anonymous access or runtime credentials. Enter a value. Press Enter to leave empty. access_key_id> *************** Option secret_access_key. AWS Secret Access Key ( password ) . Leave blank for anonymous access or runtime credentials. Enter a value. Press Enter to leave empty. secret_access_key> ************* Option region. Region to connect to. Choose a number from below, or type in your own value. Press Enter to leave empty. / The default endpoint - a good choice if you are unsure. 1 | US Region, Northern Virginia, or Pacific Northwest. | Leave location constraint empty. \\ ( us-east-1 ) / US East ( Ohio ) Region. [ ... ] [ take defaults for all remaining questions Edit advanced config? y ) Yes n ) No ( default ) y/n> n Configuration complete. Options: - type: s3 - provider: AWS - access_key_id: *************** - secret_access_key: **************** - region: us-east-1 Tip if you want to use rclone for a shared google drive, you should answer 'y' when it asks whether you want to configure it as a Shared Drive. Configure this as a Shared Drive ( Team Drive ) ? y ) Yes n ) No ( default ) y/n> y Tip rclone config creates a file storing cloud storage configurations for rclone. You can check the file name with rclone config file . The config file can be copied to other clusters so that you can use rclone on the other clusters without running rclone config again. Use Rclone on Yale clusters As we have used remote as the name of the cloud storage in our examples above, we will continue using it in the following examples. You should replace it with the actual name you have picked up for the cloud storage in your configuration. Tip If you forgot the name of the cloud storage you have configured, run rclone config show and the name will be shown in [] . $ rclone config show [ remote ] type = drive scope = drive token = { \"access_token\" : \"mytoken\" , \"expiry\" : \"2021-07-09T22:13:56.452750648-04:00\" } root_folder_id = myid List files rclone ls remote:/ Copy files # to download a file to the cluster rclone copy remote:/path/to/filename . # to upload a file from the cluster to the cloud storage rclone copy filename remote:/path/to/ Help rclone help","title":"Rclone"},{"location":"clusters-at-yale/guides/rclone/#rclone","text":"rclone is a command line tool to sync files and directories to and from all major cloud storage sites. You can use rclone to sync files and directories between Yale clusters and Yale Box, google drive, etc. The following instructions cover basics to setup and use rclone on Yale clusters. For more information about Rclone, please visit its website at https://rclone.org .","title":"Rclone"},{"location":"clusters-at-yale/guides/rclone/#set-up-rclone-on-ycrc-clusters","text":"Before accessing a remote cloud storage using rclone , you need to run rclone config to configure the storage for rclone . Since rclone config will try to bring up a browser for you to authorize the cloud storage, we recommend you to use Open OnDemand . To run rclone config on OOD, first click Remote Desktop from the OOD dashboard. Once a session starts running, click Connect to Remote Desktop and you will see a terminal on the desktop in the browser. Run rclone config at the command line of the terminal. During configuration, you will see a message similar to the following: If your browser does not open automatically go to the following link: http://127.0.0.1:53682/auth Log in and authorize rclone for access Waiting for code... If no browser started automatically, then start Firefox manually by clicking the Firefox icon on the top bar of the Remote Desktop. Copy the link from the message shown on your screen and paste it to the address bar of Firefox. Log in with your Yale email address, respond to the DUO request, and authorize the access. Tip If you received an error stating that your session has expired for DUO, simply paste the link and reload the page. If you still get the expired message, log out of CAS in your browser by going to https://secure.its.yale.edu/cas/logout. After logging out, paste the link and reload.","title":"Set up Rclone on YCRC clusters"},{"location":"clusters-at-yale/guides/rclone/#examples","text":"The following examples show you how to set up rclone for a viriety of different storage types. In the examples, we name our remote cloud storage as 'remote' in the configuration. You can provide any name you want. Google Drive The example below is a screen dump when setting up rclone for Google Drive. [ pl543@c03n06 ~ ] $ rclone config No remotes found - make a new one n ) New remote s ) Set configuration password q ) Quit config n/s/q> n name> remote Type of storage to configure. Enter a string value. Press Enter for the default ( \"\" ) . Choose a number from below, or type in your own value 1 / 1Fichier \\ \"fichier\" 2 / Alias for an existing remote \\ \"alias\" [ ... ] 15 / Google Drive \\ \"drive\" [ ... ] 42 / seafile \\ \"seafile\" Storage> 15 ** See help for drive backend at: https://rclone.org/drive/ ** Google Application Client Id Setting your own is recommended. See https://rclone.org/drive/#making-your-own-client-id for how to create your own. If you leave this blank, it will use an internal key which is low performance. Enter a string value. Press Enter for the default ( \"\" ) . client_id> OAuth Client Secret Leave blank normally. Enter a string value. Press Enter for the default ( \"\" ) . client_secret> Scope that rclone should use when requesting access from drive. Enter a string value. Press Enter for the default ( \"\" ) . Choose a number from below, or type in your own value 1 / Full access all files, excluding Application Data Folder. \\ \"drive\" 2 / Read-only access to file metadata and file contents. \\ \"drive.readonly\" / Access to files created by rclone only. 3 | These are visible in the drive website. | File authorization is revoked when the user deauthorizes the app. \\ \"drive.file\" / Allows read and write access to the Application Data folder. 4 | This is not visible in the drive website. \\ \"drive.appfolder\" / Allows read-only access to file metadata but 5 | does not allow any access to read or download file content. \\ \"drive.metadata.readonly\" scope> 1 ID of the root folder Leave blank normally. Fill in to access \"Computers\" folders ( see docs ) , or for rclone to use a non root folder as its starting point. Enter a string value. Press Enter for the default ( \"\" ) . root_folder_id> Service Account Credentials JSON file path Leave blank normally. Needed only if you want use SA instead of interactive login. Leading ` ~ ` will be expanded in the file name as will environment variables such as ` ${ RCLONE_CONFIG_DIR } ` . Enter a string value. Press Enter for the default ( \"\" ) . service_account_file> Edit advanced config? ( y/n ) y ) Yes n ) No ( default ) y/n> n Remote config Use auto config? * Say Y if not sure * Say N if you are working on a remote or headless machine y ) Yes ( default ) n ) No y/n> y If your browser doesn ' t open automatically go to the following link: http://127.0.0.1:53682/auth?state = 6glRr_mpEORxHevlOaaYyw Log in and authorize rclone for access Waiting for code... Got code Configure this as a Shared Drive ( Team Drive ) ? y ) Yes n ) No ( default ) y/n> n -------------------- [ remote ] type = drive scope = drive token = { \"access_token\" : \"ya29.A0ArdaM-mBYFKBE2gieODvdANCZRV6Y8QHhQF-lY74E9fr1HTLOwwLRuoQQbO9P-Jdip62YYhqXfcuWT0KLKGdhUb9M8g2Z4XEQqoNLwZyA-FA2AAYYBqB\" , \"token_type\" : \"Bearer\" , \"refresh_token\" : \"1//0dDu3r6KVakgYIARAAGA0NwF-L9IrWIuG7_f44x-uLR2vvBocf4q-KnQVhlkm18TO2Fn0GjJp-cArWfj5kY84\" , \"expiry\" : \"2021-02-25T17:28:18.629507046-05 :00\" } -------------------- y ) Yes this is OK ( default ) e ) Edit this remote d ) Delete this remote y/e/d> y Current remotes: Name Type ==== ==== remote drive e ) Edit existing remote n ) New remote d ) Delete remote r ) Rename remote c ) Copy remote s ) Set configuration password q ) Quit config e/n/d/r/c/s/q> q Box The example below is a screen dump when setting up rclone for Yale Box. [ pl543@c14n07 ~ ] $ rclone config No remotes found - make a new one n ) New remote s ) Set configuration password q ) Quit config n/s/q> n name> remote Type of storage to configure. Enter a string value. Press Enter for the default ( \"\" ) . Choose a number from below, or type in your own value 1 / 1Fichier \\ \"fichier\" [ ... ] 6 / Box \\ \"box\" [ ... ] Storage> box ** See help for box backend at: https://rclone.org/box/ ** Box App Client Id. Leave blank normally. Enter a string value. Press Enter for the default ( \"\" ) . client_id> Box App Client Secret Leave blank normally. Enter a string value. Press Enter for the default ( \"\" ) . client_secret> Edit advanced config? ( y/n ) y ) Yes n ) No y/n> n Remote config Use auto config? * Say Y if not sure * Say N if you are working on a remote or headless machine y ) Yes n ) No y/n> y If your browser does not open automatically go to the following link: http://127.0.0.1:53682/auth Log in and authorize rclone for access Waiting for code... Got code -------------------- [ remote ] type = box token = { \"access_token\" : \"PjIXHUZ34VQSmeUZ9r6bhc9ux44KMU0e\" , \"token_type\" : \"bearer\" , \"refresh_token\" : \"VumWPWP5Nd0M2C1GyfgfJL51zUeWPPVLc6VC6lBQduEPsQ9a6ibSor2dvHmyZ6B8\" , \"expiry\" : \"2019-10-21T11:00:36.839586736-04:00\" } -------------------- y ) Yes this is OK e ) Edit this remote d ) Delete this remote y/e/d> y Current remotes: Name Type ==== ==== remote box e ) Edit existing remote n ) New remote d ) Delete remote r ) Rename remote c ) Copy remote s ) Set configuration password q ) Quit config e/n/d/r/c/s/q> q S3 The example below is a screen dump when setting up rclone for an S3 provider such as aws. [ rdb9@login1.mccleary ~ ] $ rclone config Enter configuration password: password: Current remotes: Name Type ==== ==== [ ... ] e ) Edit existing remote n ) New remote d ) Delete remote r ) Rename remote c ) Copy remote s ) Set configuration password q ) Quit config e/n/d/r/c/s/q> n ``` bash Enter name for new remote. name> remote Option Storage. Type of storage to configure. Choose a number from below, or type in your own value. [ ... ] 5 / Amazon S3 Compliant Storage Providers including AWS, Alibaba, Ceph, China Mobile, Cloudflare, ArvanCloud, DigitalOcean, Dreamhost, Huawei OBS, IBM COS, IDrive e2, IONOS Cloud, Liara, Lyve Cloud, Minio, Netease, RackCorp, Scaleway, SeaweedFS, StackPath, Storj, Tencent COS, Qiniu and Wasabi \\ ( s3 ) [ ... ] Storage> 5 Option provider. Choose your S3 provider. Choose a number from below, or type in your own value. Press Enter to leave empty. 1 / Amazon Web Services ( AWS ) S3 \\ ( AWS ) [ ... ] provider> 1 Option env_auth. Get AWS credentials from runtime ( environment variables or EC2/ECS meta data if no env vars ) . Only applies if access_key_id and secret_access_key is blank. Choose a number from below, or type in your own boolean value ( true or false ) . Press Enter for the default ( false ) . 1 / Enter AWS credentials in the next step. \\ ( false ) 2 / Get AWS credentials from the environment ( env vars or IAM ) . \\ ( true ) env_auth> Option access_key_id. AWS Access Key ID. Leave blank for anonymous access or runtime credentials. Enter a value. Press Enter to leave empty. access_key_id> *************** Option secret_access_key. AWS Secret Access Key ( password ) . Leave blank for anonymous access or runtime credentials. Enter a value. Press Enter to leave empty. secret_access_key> ************* Option region. Region to connect to. Choose a number from below, or type in your own value. Press Enter to leave empty. / The default endpoint - a good choice if you are unsure. 1 | US Region, Northern Virginia, or Pacific Northwest. | Leave location constraint empty. \\ ( us-east-1 ) / US East ( Ohio ) Region. [ ... ] [ take defaults for all remaining questions Edit advanced config? y ) Yes n ) No ( default ) y/n> n Configuration complete. Options: - type: s3 - provider: AWS - access_key_id: *************** - secret_access_key: **************** - region: us-east-1 Tip if you want to use rclone for a shared google drive, you should answer 'y' when it asks whether you want to configure it as a Shared Drive. Configure this as a Shared Drive ( Team Drive ) ? y ) Yes n ) No ( default ) y/n> y Tip rclone config creates a file storing cloud storage configurations for rclone. You can check the file name with rclone config file . The config file can be copied to other clusters so that you can use rclone on the other clusters without running rclone config again.","title":"Examples"},{"location":"clusters-at-yale/guides/rclone/#use-rclone-on-yale-clusters","text":"As we have used remote as the name of the cloud storage in our examples above, we will continue using it in the following examples. You should replace it with the actual name you have picked up for the cloud storage in your configuration. Tip If you forgot the name of the cloud storage you have configured, run rclone config show and the name will be shown in [] . $ rclone config show [ remote ] type = drive scope = drive token = { \"access_token\" : \"mytoken\" , \"expiry\" : \"2021-07-09T22:13:56.452750648-04:00\" } root_folder_id = myid","title":"Use Rclone on Yale clusters"},{"location":"clusters-at-yale/guides/rclone/#list-files","text":"rclone ls remote:/","title":"List files"},{"location":"clusters-at-yale/guides/rclone/#copy-files","text":"# to download a file to the cluster rclone copy remote:/path/to/filename . # to upload a file from the cluster to the cloud storage rclone copy filename remote:/path/to/","title":"Copy files"},{"location":"clusters-at-yale/guides/rclone/#help","text":"rclone help","title":"Help"},{"location":"clusters-at-yale/guides/tmux/","text":"tmux tmux is a \"terminal multiplexer\", it enables a number of terminals (or windows) to be accessed and controlled from a single terminal. tmux is a great way to save an interactive session between connections you make to the clusters. You can reconnect to the session from a workstation in your lab or from your laptop from home! Get Started To begin a tmux session named myproject, type tmux new -s myproject You should see a bar across the bottom of your terminal window now that gives you some information about your session. If you are disconnected or detached from this session, anything you were doing will still be there waiting when you reattach The most important shortcut to remember is Ctrl + b (hold the ctrl or control key, then type \"b\"). This is how you signal to tmux that the following keystroke is meant for it and not the session you are working in. For example: if you want to gracefully detach from your session, you can type Ctrl + b , then d for detach. To reattach to our sample tmux session after detatching, type: tmux attach -t myproject #If you are lazy and have only one session running, #This works too: tmux a Lines starting with a \"#\" denote a commented line, which aren't read as code Finally, to exit, you can type exit or Ctrl + d tmux on the Clusters Using tmux on the cluster allows you to create interactive allocations that you can detach from. Normally, if you get an interactive allocation (e.g. salloc ) then disconnect from the cluster, for example by putting your laptop to sleep, your allocation will be terminated and your job killed. Using tmux, you can detach gracefully and tmux will maintain your allocation. Here is how to do this correctly: ssh to your cluster of choice Start tmux Inside your tmux session, submit an interactive job with salloc . See the Slurm documentation for more details Inside your job allocation (on a compute node), start your application (e.g. matlab) Detach from tmux by typing Ctrl + b then d Later, on the same login node, reattach by running tmux attach Make sure to: run tmux on the login node, NOT on compute nodes run salloc inside tmux, not the reverse. Warning Every cluster has two login nodes. If you cannot find your tmux session, it might be running on the other node. Check the hostname of your current login node (from either your command prompt or from running hostname -s ), then use ssh to login to the other one. For example, if you are logged in to grace1, use ssh -Y grace2 to reach the other login node. Windows and Panes tmux allows you to create, toggle between and manipulate panes and windows in your session. A window is the whole screen that tmux displays to you. Panes are subdivisions in the curent window, where each runs an independent terminal. Especially at first, you probably won't need more than one pane at a time. Multiple windows can be created and run off-screen. Here is an example where this may be useful. Say you just submitted an interactive job that is running on a compute node inside your tmux session. [ ms725@grace1 ~ ] $ tmux new -s analysis # I am in my tmux session now [ ms725@grace1 ~ ] $ salloc [ ms725@c14n02 ~ ] $ ./my_fancy_analysis.sh Now you can easily monitor its CPU and memory utilization without ever taking your eyes off of it by creating a new pane and running top there. Split your window by typing: Ctrl + b then % ssh into the compute node you are working on, then run top to watch your work as it runs all from the same window. # I'm in a new pane now. [ ms725@grace1 ~ ] $ ssh c14n02 [ ms725@c14n02 ~ ] $ top Your view will look something like this: To switch back and forth between panes, type Ctrl + b then o","title":"tmux"},{"location":"clusters-at-yale/guides/tmux/#tmux","text":"tmux is a \"terminal multiplexer\", it enables a number of terminals (or windows) to be accessed and controlled from a single terminal. tmux is a great way to save an interactive session between connections you make to the clusters. You can reconnect to the session from a workstation in your lab or from your laptop from home!","title":"tmux"},{"location":"clusters-at-yale/guides/tmux/#get-started","text":"To begin a tmux session named myproject, type tmux new -s myproject You should see a bar across the bottom of your terminal window now that gives you some information about your session. If you are disconnected or detached from this session, anything you were doing will still be there waiting when you reattach The most important shortcut to remember is Ctrl + b (hold the ctrl or control key, then type \"b\"). This is how you signal to tmux that the following keystroke is meant for it and not the session you are working in. For example: if you want to gracefully detach from your session, you can type Ctrl + b , then d for detach. To reattach to our sample tmux session after detatching, type: tmux attach -t myproject #If you are lazy and have only one session running, #This works too: tmux a Lines starting with a \"#\" denote a commented line, which aren't read as code Finally, to exit, you can type exit or Ctrl + d","title":"Get Started"},{"location":"clusters-at-yale/guides/tmux/#tmux-on-the-clusters","text":"Using tmux on the cluster allows you to create interactive allocations that you can detach from. Normally, if you get an interactive allocation (e.g. salloc ) then disconnect from the cluster, for example by putting your laptop to sleep, your allocation will be terminated and your job killed. Using tmux, you can detach gracefully and tmux will maintain your allocation. Here is how to do this correctly: ssh to your cluster of choice Start tmux Inside your tmux session, submit an interactive job with salloc . See the Slurm documentation for more details Inside your job allocation (on a compute node), start your application (e.g. matlab) Detach from tmux by typing Ctrl + b then d Later, on the same login node, reattach by running tmux attach Make sure to: run tmux on the login node, NOT on compute nodes run salloc inside tmux, not the reverse. Warning Every cluster has two login nodes. If you cannot find your tmux session, it might be running on the other node. Check the hostname of your current login node (from either your command prompt or from running hostname -s ), then use ssh to login to the other one. For example, if you are logged in to grace1, use ssh -Y grace2 to reach the other login node.","title":"tmux on the Clusters"},{"location":"clusters-at-yale/guides/tmux/#windows-and-panes","text":"tmux allows you to create, toggle between and manipulate panes and windows in your session. A window is the whole screen that tmux displays to you. Panes are subdivisions in the curent window, where each runs an independent terminal. Especially at first, you probably won't need more than one pane at a time. Multiple windows can be created and run off-screen. Here is an example where this may be useful. Say you just submitted an interactive job that is running on a compute node inside your tmux session. [ ms725@grace1 ~ ] $ tmux new -s analysis # I am in my tmux session now [ ms725@grace1 ~ ] $ salloc [ ms725@c14n02 ~ ] $ ./my_fancy_analysis.sh Now you can easily monitor its CPU and memory utilization without ever taking your eyes off of it by creating a new pane and running top there. Split your window by typing: Ctrl + b then % ssh into the compute node you are working on, then run top to watch your work as it runs all from the same window. # I'm in a new pane now. [ ms725@grace1 ~ ] $ ssh c14n02 [ ms725@c14n02 ~ ] $ top Your view will look something like this: To switch back and forth between panes, type Ctrl + b then o","title":"Windows and Panes"},{"location":"clusters-at-yale/guides/vasp/","text":"VASP Note VASP requires a paid license. If you wish to use VASP on the cluster and your research group has purchased a license, please contact us to gain access to the cluster installation. Thank you for your cooperation. VASP and Slurm In Slurm, there is big difference between --ntasks and --cpus-per-task which is explained in our Requesting Resources documentation . For the purposes of VASP, --ntasks-per-node should always equal NCORE (in your INCAR file). Then --nodes should be equal to the total number of cores you want, divided by --ntasks-per-node . VASP has two parameters for controlling processor layouts, NCORE and NPAR , but you only need to set one of them. If you set NCORE , you don\u2019t need to set NPAR . Instead VASP will automatically set NPAR . In your mpirun line, you should specify the number of MPI tasks as: mpirun -n $SLURM_NTASKS vasp_std Cores Layout Examples If you want 40 cores (2 nodes and 20 cpus per node): in your submission script: #SBATCH --nodes=2 #SBATCH --ntasks-per-node=20 mpirun -n 2 vasp_std in INCAR : NCORE=20 You may however find that the wait time to get 20 cores on two nodes can be very long since cores request via --cpus-per-task can\u2019t span multiple nodes. Instead you might want to try breaking it up into smaller chunks. Therefore, try: in your submission script: #SBATCH --nodes=4 #SBATCH --ntasks-per-node=10 mpirun -n 4 vasp_std in INCAR : NCORE=10 which would likely spread over 4 nodes using 10 cores each and spend less time in the queue. Grace mpi partition On Grace's mpi parttion, since cores are assigned as whole 24-core nodes, NCORE should always be equal to 24 and then you can just request ntasks in multiples of 24. in your submission script: #SBATCH --ntasks=48 # some multiple of 24 mpirun -n $SLURM_NTASKS vasp_std in INCAR : NCORE=24 Additional Performance Some users have found that if they actually assign 2 MPI tasks per node (rather than 1), they see even better performance because the MPI tasks doesn't span the two sockets on the node. To try this, set NCORE to half of your nodes' core count and increase mpirun -n to twice the number of nodes you requested. Additional Reading Here is some documentation on how to optimally configure NCORE and NPAR: https://www.vasp.at/wiki/index.php/NCORE https://www.vasp.at/wiki/index.php/NPAR https://www.nsc.liu.se/~pla/blog/2015/01/12/vasp-how-many-cores/","title":"VASP"},{"location":"clusters-at-yale/guides/vasp/#vasp","text":"Note VASP requires a paid license. If you wish to use VASP on the cluster and your research group has purchased a license, please contact us to gain access to the cluster installation. Thank you for your cooperation.","title":"VASP"},{"location":"clusters-at-yale/guides/vasp/#vasp-and-slurm","text":"In Slurm, there is big difference between --ntasks and --cpus-per-task which is explained in our Requesting Resources documentation . For the purposes of VASP, --ntasks-per-node should always equal NCORE (in your INCAR file). Then --nodes should be equal to the total number of cores you want, divided by --ntasks-per-node . VASP has two parameters for controlling processor layouts, NCORE and NPAR , but you only need to set one of them. If you set NCORE , you don\u2019t need to set NPAR . Instead VASP will automatically set NPAR . In your mpirun line, you should specify the number of MPI tasks as: mpirun -n $SLURM_NTASKS vasp_std","title":"VASP and Slurm"},{"location":"clusters-at-yale/guides/vasp/#cores-layout-examples","text":"If you want 40 cores (2 nodes and 20 cpus per node): in your submission script: #SBATCH --nodes=2 #SBATCH --ntasks-per-node=20 mpirun -n 2 vasp_std in INCAR : NCORE=20 You may however find that the wait time to get 20 cores on two nodes can be very long since cores request via --cpus-per-task can\u2019t span multiple nodes. Instead you might want to try breaking it up into smaller chunks. Therefore, try: in your submission script: #SBATCH --nodes=4 #SBATCH --ntasks-per-node=10 mpirun -n 4 vasp_std in INCAR : NCORE=10 which would likely spread over 4 nodes using 10 cores each and spend less time in the queue.","title":"Cores Layout Examples"},{"location":"clusters-at-yale/guides/vasp/#grace-mpi-partition","text":"On Grace's mpi parttion, since cores are assigned as whole 24-core nodes, NCORE should always be equal to 24 and then you can just request ntasks in multiples of 24. in your submission script: #SBATCH --ntasks=48 # some multiple of 24 mpirun -n $SLURM_NTASKS vasp_std in INCAR : NCORE=24","title":"Grace mpi partition"},{"location":"clusters-at-yale/guides/vasp/#additional-performance","text":"Some users have found that if they actually assign 2 MPI tasks per node (rather than 1), they see even better performance because the MPI tasks doesn't span the two sockets on the node. To try this, set NCORE to half of your nodes' core count and increase mpirun -n to twice the number of nodes you requested.","title":"Additional Performance"},{"location":"clusters-at-yale/guides/vasp/#additional-reading","text":"Here is some documentation on how to optimally configure NCORE and NPAR: https://www.vasp.at/wiki/index.php/NCORE https://www.vasp.at/wiki/index.php/NPAR https://www.nsc.liu.se/~pla/blog/2015/01/12/vasp-how-many-cores/","title":"Additional Reading"},{"location":"clusters-at-yale/guides/virtualgl/","text":"VirtualGL Why VirtualGL To display a 3D application running remotely on a cluster, you could use X11 forwarding to display the application on your local machine. This is usually very slow and often unusable. An alternative approach is to use VNC - also called Remote Desktop - to run GUI applications remotely on the cluster. This approach only works well with applications that only need moderate 3D rendering where software rendering is good enough. For applications that need to render large complicated models, hardware accelerated 3D rendering must be used. However, VNC cannot directly utilize the graphic devices on the cluster for rendering. VirtualGL , in conjunction with VNC, provides a commonly used solution for remote 3D rendering with hardware acceleration. How to use VirtualGL VirtualGL 3.0+ supports the traditional GLX back end and the new EGL back end for 3D rendering. The EGL back end uses a DRI (Direct Rendering Infrastructure) device to access a graphics device, while the GLX back end uses an X server to access a graphics device. The EGL back end allows simultaneous jobs on the same node, each using their own dedicated GPU device for rendering. Although it can render many applications properly, the EGL back end may fail to render some applications. The GLX back end supports a wider range of OpenGL applications than the EGL back end, however, only one X server can work properly with the graphics devices on the node. This means only one job can use the GLX back end on any GPU node, no matter how many GPU devices the node has. We suggest you use the EGL back end first. If it does not render your application properly, then switch to the GLX back end. We have provided a wrapper script ycrc_vglrun to make it easy for you to choose which back end to use for 3D rendering. In the following examples, we will use ParaView (unless mentioned otherwise) to demonstrate how to use ycrc_vglrun . Note If you need to run a hardware accelerated GUI application, you should first start a Remote Desktop on a GPU node, and then run the application from the shell in the Remote Desktop as shown below. We have not incorporated VirtualGL into the standalone interactive Apps on OOD that could benefit from VirtualGL. However, this could change in the future. Use VirtualGL with the EGL back end EGL is the default back end which ycrc_vglrun will choose to use if no option is provided. You can also add the -e option to choose the EGL back end explicitly. module load ParaView ycrc_vglrun paraview module load ParaView ycrc_vglrun -e paraview Use VirtualGL with the GLX back end If your application cannot be rendered properly with the EGL back end, your next step is to try the GLX back end. You should choose it explicitly with the -g option. module load ParaView ycrc_vglrun -g paraview Run MATLAB with hardware OpenGL rendering By default, MATLAB will use software OpenGL rendering. To run MATLAB with hardware OpenGL rendering, add -nosoftwareopengl . module load MATLAB ycrc_vglrun matlab -nosoftwareopengl Troubleshoot nvidia-smi or vglrun cannot be found You must submit your job to a GPU node. If you are using the Remote Desktop from OOD, make sure you have specified gpu as 1 and partition as gpu or any other partition with GPU nodes. GLX back end is used by another application If you get the following message when running your application with the GLX back end, you need to add --exclude=nodename to Advanced options in the Remote Desktop OOD user interface and resubmit Remote Desktop. Replace nodename with the actual node name from the message. VirtualGL with the GLX back end is currently used by another application. Please resubmit your job with --exclude = c22n01","title":"VirtualGL"},{"location":"clusters-at-yale/guides/virtualgl/#virtualgl","text":"","title":"VirtualGL"},{"location":"clusters-at-yale/guides/virtualgl/#why-virtualgl","text":"To display a 3D application running remotely on a cluster, you could use X11 forwarding to display the application on your local machine. This is usually very slow and often unusable. An alternative approach is to use VNC - also called Remote Desktop - to run GUI applications remotely on the cluster. This approach only works well with applications that only need moderate 3D rendering where software rendering is good enough. For applications that need to render large complicated models, hardware accelerated 3D rendering must be used. However, VNC cannot directly utilize the graphic devices on the cluster for rendering. VirtualGL , in conjunction with VNC, provides a commonly used solution for remote 3D rendering with hardware acceleration.","title":"Why VirtualGL"},{"location":"clusters-at-yale/guides/virtualgl/#how-to-use-virtualgl","text":"VirtualGL 3.0+ supports the traditional GLX back end and the new EGL back end for 3D rendering. The EGL back end uses a DRI (Direct Rendering Infrastructure) device to access a graphics device, while the GLX back end uses an X server to access a graphics device. The EGL back end allows simultaneous jobs on the same node, each using their own dedicated GPU device for rendering. Although it can render many applications properly, the EGL back end may fail to render some applications. The GLX back end supports a wider range of OpenGL applications than the EGL back end, however, only one X server can work properly with the graphics devices on the node. This means only one job can use the GLX back end on any GPU node, no matter how many GPU devices the node has. We suggest you use the EGL back end first. If it does not render your application properly, then switch to the GLX back end. We have provided a wrapper script ycrc_vglrun to make it easy for you to choose which back end to use for 3D rendering. In the following examples, we will use ParaView (unless mentioned otherwise) to demonstrate how to use ycrc_vglrun . Note If you need to run a hardware accelerated GUI application, you should first start a Remote Desktop on a GPU node, and then run the application from the shell in the Remote Desktop as shown below. We have not incorporated VirtualGL into the standalone interactive Apps on OOD that could benefit from VirtualGL. However, this could change in the future.","title":"How to use VirtualGL"},{"location":"clusters-at-yale/guides/virtualgl/#use-virtualgl-with-the-egl-back-end","text":"EGL is the default back end which ycrc_vglrun will choose to use if no option is provided. You can also add the -e option to choose the EGL back end explicitly. module load ParaView ycrc_vglrun paraview module load ParaView ycrc_vglrun -e paraview","title":"Use VirtualGL with the EGL back end"},{"location":"clusters-at-yale/guides/virtualgl/#use-virtualgl-with-the-glx-back-end","text":"If your application cannot be rendered properly with the EGL back end, your next step is to try the GLX back end. You should choose it explicitly with the -g option. module load ParaView ycrc_vglrun -g paraview","title":"Use VirtualGL with the GLX back end"},{"location":"clusters-at-yale/guides/virtualgl/#run-matlab-with-hardware-opengl-rendering","text":"By default, MATLAB will use software OpenGL rendering. To run MATLAB with hardware OpenGL rendering, add -nosoftwareopengl . module load MATLAB ycrc_vglrun matlab -nosoftwareopengl","title":"Run MATLAB with hardware OpenGL rendering"},{"location":"clusters-at-yale/guides/virtualgl/#troubleshoot","text":"","title":"Troubleshoot"},{"location":"clusters-at-yale/guides/virtualgl/#nvidia-smi-or-vglrun-cannot-be-found","text":"You must submit your job to a GPU node. If you are using the Remote Desktop from OOD, make sure you have specified gpu as 1 and partition as gpu or any other partition with GPU nodes.","title":"nvidia-smi or vglrun cannot be found"},{"location":"clusters-at-yale/guides/virtualgl/#glx-back-end-is-used-by-another-application","text":"If you get the following message when running your application with the GLX back end, you need to add --exclude=nodename to Advanced options in the Remote Desktop OOD user interface and resubmit Remote Desktop. Replace nodename with the actual node name from the message. VirtualGL with the GLX back end is currently used by another application. Please resubmit your job with --exclude = c22n01","title":"GLX back end is used by another application"},{"location":"clusters-at-yale/guides/xvfb/","text":"Virtual Frame Buffer for Batch Mode Often there is a need to run a program with a graphical interface in batch mode. This can be either due to extended run-time or the desire to run many instances of the process at once. In either case the lack of a display can prevent the program from running. A solution has been developed to create a virtual display that only lives in memory. This allows the program to happily launch its graphical interface while in batch mode. Note It is common for R to require a display session to save certain types of figures. You may see a warning like \"unable to start device PNG\" or \"unable to open connection to X11 display\". xvfb can help avoid these issues. This tool is called the X Virtual Frame Buffer or xvfb . It can act as a wrapper to your script which creates a virtual display session. For example, to run an R script (e.g. make_jpeg.R ) which needs a display session in order to save a JPEG file: xvfb-run Rscript make_jpeg.R For more details and other examples see the xvfb-run man page by running man xvfb-run on any compute node.","title":"XVFB"},{"location":"clusters-at-yale/guides/xvfb/#virtual-frame-buffer-for-batch-mode","text":"Often there is a need to run a program with a graphical interface in batch mode. This can be either due to extended run-time or the desire to run many instances of the process at once. In either case the lack of a display can prevent the program from running. A solution has been developed to create a virtual display that only lives in memory. This allows the program to happily launch its graphical interface while in batch mode. Note It is common for R to require a display session to save certain types of figures. You may see a warning like \"unable to start device PNG\" or \"unable to open connection to X11 display\". xvfb can help avoid these issues. This tool is called the X Virtual Frame Buffer or xvfb . It can act as a wrapper to your script which creates a virtual display session. For example, to run an R script (e.g. make_jpeg.R ) which needs a display session in order to save a JPEG file: xvfb-run Rscript make_jpeg.R For more details and other examples see the xvfb-run man page by running man xvfb-run on any compute node.","title":"Virtual Frame Buffer for Batch Mode"},{"location":"clusters-at-yale/job-scheduling/","text":"Run Jobs with Slurm Performing computational work at scale in a shared environment involves organizing everyone's work into jobs and scheduling them. We use Slurm to schedule and manage jobs on the YCRC clusters . Submitting a job involves specifying a resource request then running one or more commands or applications. These requests take the form of options to the command-line programs salloc and sbatch or those same options as directives inside submission scripts. Requests are made of groups of compute nodes (servers) called partitions. Partitions, their defaults, limits, and purposes are listed on each cluster page . Once submitted, jobs wait in a queue and are subject to several factors affecting scheduling priority . When your scheduled job begins, the commands or applications you specify are run on compute nodes the scheduler found to satisfy your resource request. If the job was submitted as a batch job, output normally printed to the screen will be saved to file. Please be a good cluster citizen. Do not run heavy computation on login nodes (e.g. grace1 , login1.mccleary ). Doing so negatively impacts everyone's ability to interact with the cluster. Make resource requests for your jobs that reflect what they will use. Wasteful job allocations slow down everyone's work on the clusters. See our documentation on Monitoring CPU and Memory Usage for how to measure job resource usage. If you plan to run many similar jobs, use our Dead Simple Queue tool or job arrays - we enforce limits on job submission rates on all clusters. If you find yourself wondering how best to schedule a job, please contact us for some help. Common Slurm Commands For an exhaustive list of commands and their official manuals, see the SchedMD Man Pages . Below are some of the most common commands used to interact with the scheduler. Submit a script called my_job.sh as a job ( see below for details): sbatch my_job.sh List your queued and running jobs: squeue --me Cancel a queued job or kill a running job, e.g. a job with ID 12345: scancel 12345 Check status of a job, e.g. a job with ID 12345: sacct -j 12345 Check how efficiently a job ran, e.g. a job with ID 12345: seff 12345 See our Monitor CPU and Memory page for more on tracking the resources your job actually uses. Common Job Request Options These options modify the size, length and behavior of jobs you submit. They can be specified when calling salloc or sbatch , or saved to a batch script . Options specified on the command line to sbatch will override those in a batch script. See our Request Compute Resources page for discussion on the differences between --ntasks and --cpus-per-task , constraints, GPUs, etc. If options are left unspecified defaults are used. Long Option Short Option Default Description --job-name -J Name of script Custom job name. --output -o \"slurm-%j.out\" Where to save stdout and stderr from the job. See filename patterns for more formatting options. --partition -p Varies by cluster Partition to run on. See individual cluster pages for details. --account -A Your group name Specify if you have access to multiple private partitions. --time -t Varies by partition Time limit for the job in D-HH:MM:SS, e.g. -t 1- is one day, -t 4:00:00 is 4 hours. --nodes -N 1 Total number of nodes. --ntasks -n 1 Number of tasks (MPI workers). --ntasks-per-node Scheduler decides Number of tasks per node. --cpus-per-task -c 1 Number of CPUs for each task. Use this for threads/cores in single-node jobs. --mem-per-cpu 5G Memory requested per CPU in MiB. Add G to specify GiB (e.g. 10G ). --mem Memory requested per node in MiB. Add G to specify GiB (e.g. 10G ). --gpus -G Used to request GPUs --constraint -C Constraints on node features. To limit kinds of nodes to run on. --mail-user Your Yale email Mail address (alternatively, put your email address in ~/.forward). --mail-type None Send email when jobs change state. Use ALL to receive email notifications at the beginning and end of the job. Interactive Jobs Interactive jobs can be used for testing and troubleshooting code. Requesting an interactive job will allocate resources and log you into a shell on a compute node. You can start an interactive job using the salloc command. Unless specified otherwise using the -p flag (see above), all salloc requests will go to the devel ( interactive on Milgram and Ruddle) partition on the cluster. For example, to request an interactive job with 8GB of RAM for 2 hours: salloc -t 2 :00:00 --mem = 8G This will assign one CPU and 8GiB of RAM to you for two hours. You can run commands in this shell as needed. To exit, you can type exit or Ctrl + d Use tmux with Interactive Sessions Remote sessions are vulnerable to being killed if you lose your network connection. We recommend using tmux alleviate this. When using tmux with interactive jobs, please take extra care to stop jobs that are no longer needed. Graphical applications Many graphical applications are well served with the Open OnDemand Remote Desktop app . If you would like to use X11 forwarding, first make sure it is installed and configured . Then, add the --x11 flag to an interactive job request: salloc --x11 Batch Jobs You can submit a script as a batch job, i.e. one that can be run non-interactively in batches. These submission scripts are comprised of three parts: A hashbang line specifying the program that runs the script. This is normally #!/bin/bash . Directives that list job request options. These lines must appear before any other commands or definitions, otherwise they will be ignored. The commands or applications you want executed during your job. See our page of Submission Script Examples for a few more, or the example scripts repo for more in-depth examples. Here is an example submission script that prints some job information and exits: #!/bin/bash #SBATCH --job-name=example_job #SBATCH --time=2:00:00 #SBATCH --mail-type=ALL module purge module load MATLAB/2021a matlab -batch \"your_script\" Save this file as example_job.sh , then submit it with: sbatch example_job.sh When the job finishes the output should be stored in a file called slurm-jobid.out , where jobid is the submitted job's ID. If you find yourself writing loops to submit jobs, instead use our Dead Simple Queue tool or job arrays .","title":"Run Jobs with Slurm"},{"location":"clusters-at-yale/job-scheduling/#run-jobs-with-slurm","text":"Performing computational work at scale in a shared environment involves organizing everyone's work into jobs and scheduling them. We use Slurm to schedule and manage jobs on the YCRC clusters . Submitting a job involves specifying a resource request then running one or more commands or applications. These requests take the form of options to the command-line programs salloc and sbatch or those same options as directives inside submission scripts. Requests are made of groups of compute nodes (servers) called partitions. Partitions, their defaults, limits, and purposes are listed on each cluster page . Once submitted, jobs wait in a queue and are subject to several factors affecting scheduling priority . When your scheduled job begins, the commands or applications you specify are run on compute nodes the scheduler found to satisfy your resource request. If the job was submitted as a batch job, output normally printed to the screen will be saved to file. Please be a good cluster citizen. Do not run heavy computation on login nodes (e.g. grace1 , login1.mccleary ). Doing so negatively impacts everyone's ability to interact with the cluster. Make resource requests for your jobs that reflect what they will use. Wasteful job allocations slow down everyone's work on the clusters. See our documentation on Monitoring CPU and Memory Usage for how to measure job resource usage. If you plan to run many similar jobs, use our Dead Simple Queue tool or job arrays - we enforce limits on job submission rates on all clusters. If you find yourself wondering how best to schedule a job, please contact us for some help.","title":"Run Jobs with Slurm"},{"location":"clusters-at-yale/job-scheduling/#common-slurm-commands","text":"For an exhaustive list of commands and their official manuals, see the SchedMD Man Pages . Below are some of the most common commands used to interact with the scheduler. Submit a script called my_job.sh as a job ( see below for details): sbatch my_job.sh List your queued and running jobs: squeue --me Cancel a queued job or kill a running job, e.g. a job with ID 12345: scancel 12345 Check status of a job, e.g. a job with ID 12345: sacct -j 12345 Check how efficiently a job ran, e.g. a job with ID 12345: seff 12345 See our Monitor CPU and Memory page for more on tracking the resources your job actually uses.","title":"Common Slurm Commands"},{"location":"clusters-at-yale/job-scheduling/#common-job-request-options","text":"These options modify the size, length and behavior of jobs you submit. They can be specified when calling salloc or sbatch , or saved to a batch script . Options specified on the command line to sbatch will override those in a batch script. See our Request Compute Resources page for discussion on the differences between --ntasks and --cpus-per-task , constraints, GPUs, etc. If options are left unspecified defaults are used. Long Option Short Option Default Description --job-name -J Name of script Custom job name. --output -o \"slurm-%j.out\" Where to save stdout and stderr from the job. See filename patterns for more formatting options. --partition -p Varies by cluster Partition to run on. See individual cluster pages for details. --account -A Your group name Specify if you have access to multiple private partitions. --time -t Varies by partition Time limit for the job in D-HH:MM:SS, e.g. -t 1- is one day, -t 4:00:00 is 4 hours. --nodes -N 1 Total number of nodes. --ntasks -n 1 Number of tasks (MPI workers). --ntasks-per-node Scheduler decides Number of tasks per node. --cpus-per-task -c 1 Number of CPUs for each task. Use this for threads/cores in single-node jobs. --mem-per-cpu 5G Memory requested per CPU in MiB. Add G to specify GiB (e.g. 10G ). --mem Memory requested per node in MiB. Add G to specify GiB (e.g. 10G ). --gpus -G Used to request GPUs --constraint -C Constraints on node features. To limit kinds of nodes to run on. --mail-user Your Yale email Mail address (alternatively, put your email address in ~/.forward). --mail-type None Send email when jobs change state. Use ALL to receive email notifications at the beginning and end of the job.","title":"Common Job Request Options"},{"location":"clusters-at-yale/job-scheduling/#interactive-jobs","text":"Interactive jobs can be used for testing and troubleshooting code. Requesting an interactive job will allocate resources and log you into a shell on a compute node. You can start an interactive job using the salloc command. Unless specified otherwise using the -p flag (see above), all salloc requests will go to the devel ( interactive on Milgram and Ruddle) partition on the cluster. For example, to request an interactive job with 8GB of RAM for 2 hours: salloc -t 2 :00:00 --mem = 8G This will assign one CPU and 8GiB of RAM to you for two hours. You can run commands in this shell as needed. To exit, you can type exit or Ctrl + d Use tmux with Interactive Sessions Remote sessions are vulnerable to being killed if you lose your network connection. We recommend using tmux alleviate this. When using tmux with interactive jobs, please take extra care to stop jobs that are no longer needed.","title":"Interactive Jobs"},{"location":"clusters-at-yale/job-scheduling/#graphical-applications","text":"Many graphical applications are well served with the Open OnDemand Remote Desktop app . If you would like to use X11 forwarding, first make sure it is installed and configured . Then, add the --x11 flag to an interactive job request: salloc --x11","title":"Graphical applications"},{"location":"clusters-at-yale/job-scheduling/#batch-jobs","text":"You can submit a script as a batch job, i.e. one that can be run non-interactively in batches. These submission scripts are comprised of three parts: A hashbang line specifying the program that runs the script. This is normally #!/bin/bash . Directives that list job request options. These lines must appear before any other commands or definitions, otherwise they will be ignored. The commands or applications you want executed during your job. See our page of Submission Script Examples for a few more, or the example scripts repo for more in-depth examples. Here is an example submission script that prints some job information and exits: #!/bin/bash #SBATCH --job-name=example_job #SBATCH --time=2:00:00 #SBATCH --mail-type=ALL module purge module load MATLAB/2021a matlab -batch \"your_script\" Save this file as example_job.sh , then submit it with: sbatch example_job.sh When the job finishes the output should be stored in a file called slurm-jobid.out , where jobid is the submitted job's ID. If you find yourself writing loops to submit jobs, instead use our Dead Simple Queue tool or job arrays .","title":"Batch Jobs"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/","text":"Common Job Failures Your jobs haven't failed, you have just found ways to run them that won't work. Here are some common error messages and steps to correct them. Memory Limits Jobs can fail due to an insufficient memory being requested. Depending on the job, this failure might present as a Slurm error: slurmstepd: error: Detected 1 oom-kill event(s). Some of your processes may have been killed by the cgroup out-of-memory handler. This means Slurm detected the job hitting the maximum requested memory and then the job was killed. When process inside a job tries to access memory outside what was allocated to that job (more than what you requested) the operating system tells your program that address is invalid with the fault Bus Error . A similar fault you might be more familiar with is a Segmentation Fault , which usually results from a program incorrectly trying to access a valid memory address. These errors can be fixed in two ways. Request More Memory The default is almost always --mem-per-cpu=5G In a batch script: #SBATCH --mem-per-cpu=8G In an interactive job: salloc --mem-per-cpu = 8G Use Less Memory This method is usually a little more involved, and is easier if you can inspect the code you are using. Watching your job's resource usage , attending a workshop , or getting in touch with us are good places to start. Disk Quotas Since the clusters are shared resources, we have quotas in place to enforce fair use of storage. When you or your group reach a quota, you can't write to existing files or create new ones. Any jobs that depend on creating or writing files that count toward the affected quota will fail. To inspect your current usage, run the command getquota . Remember, your home quota is yours but your project, scratch60, and any purchased storage quotas are shared across your group. Archive Files You may find that some files or direcories for previous projects are no longer needed on the cluster. We recommend you archive these to recover space. Delete Files If you are sure you no longer need some files or direcories, you can delete them. Unless files are in your home directory (not project or scratch60 ) they are not backed up and may be unrecoverable. Use the rm -rf command very carefully. Buy More Space If you would like to purchase more than the default quotas, we can help you buy space on the clusters . Rate Limits We rate-limit job submissions to 200 jobs per hour on each cluster. This limit helps even out load on the scheduler and encourages good practice. When you hit this limit, you will get an error when submitting new jobs that looks like this: sbatch: error: Reached jobs per hour limit sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits) You will then need to wait until your submission rate drops. Use Job Arrays To avoid hitting this limit and make large numbers of jobs more manageable, you should use Dead Simple Queue or job arrays . If you need help adapting your workflow to dsq or job arrays contact us . Software Modules We build and organize software modules on the cluster using toolchains . The major toolchains we use produce modules that end in foss-yearletter or intel-yearletter, e.g. foss-2018b or intel-2018a . If modules from different toolchains are loaded at the same time, the conflicts that arise often lead to errors or strange application behavior. Seeing either of the following messages is a sign that you are loading incompatible modules. The following have been reloaded with a version change: 1) FFTW/3.3.7-gompi-2018a => FFTW/3.3.8-gompi-2018b 2) GCC/6.4.0-2.28 => GCC/7.3.0-2.3.0 3) GCCcore/6.4.0 => GCCcore/7.3.0 ... or GCCcore/7.3.0 exists but could not be loaded as requested. Match or Purge Your Toolchains Where possible, only use one toolchain at a time. When you want to use software from muliple toolchains run module purge between running new module load commands. If your work requires a version of software that is not installed, contact us . Conda Environments Conda environments provide a nice way to manage python and R packages and modules. Conda acieves this by setting functions and environment variables that point to your environment files when you run conda activate . Unlike modules , conda environments are not completely forwarded into a job; having a conda environment loaded when you submit a job doesn't forward it well into your job. You will likely see messages about missing packages and libraries you definitely installed into the environment you want to use in your job. Load Conda Environments Right Before Use To make sure that your environment is set up properly for interactive use, wait until you are on the host you plan to use your environment on. Then run conda activate my_env . To make sure batch jobs function properly, only submit jobs without an environment loaded ( conda deactivate before sbatch ). Make sure you load miniconda and your environment in the body of your batch submission script.","title":"Common Job Failures"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#common-job-failures","text":"Your jobs haven't failed, you have just found ways to run them that won't work. Here are some common error messages and steps to correct them.","title":"Common Job Failures"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#memory-limits","text":"Jobs can fail due to an insufficient memory being requested. Depending on the job, this failure might present as a Slurm error: slurmstepd: error: Detected 1 oom-kill event(s). Some of your processes may have been killed by the cgroup out-of-memory handler. This means Slurm detected the job hitting the maximum requested memory and then the job was killed. When process inside a job tries to access memory outside what was allocated to that job (more than what you requested) the operating system tells your program that address is invalid with the fault Bus Error . A similar fault you might be more familiar with is a Segmentation Fault , which usually results from a program incorrectly trying to access a valid memory address. These errors can be fixed in two ways.","title":"Memory Limits"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#request-more-memory","text":"The default is almost always --mem-per-cpu=5G In a batch script: #SBATCH --mem-per-cpu=8G In an interactive job: salloc --mem-per-cpu = 8G","title":"Request More Memory"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#use-less-memory","text":"This method is usually a little more involved, and is easier if you can inspect the code you are using. Watching your job's resource usage , attending a workshop , or getting in touch with us are good places to start.","title":"Use Less Memory"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#disk-quotas","text":"Since the clusters are shared resources, we have quotas in place to enforce fair use of storage. When you or your group reach a quota, you can't write to existing files or create new ones. Any jobs that depend on creating or writing files that count toward the affected quota will fail. To inspect your current usage, run the command getquota . Remember, your home quota is yours but your project, scratch60, and any purchased storage quotas are shared across your group.","title":"Disk Quotas"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#archive-files","text":"You may find that some files or direcories for previous projects are no longer needed on the cluster. We recommend you archive these to recover space.","title":"Archive Files"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#delete-files","text":"If you are sure you no longer need some files or direcories, you can delete them. Unless files are in your home directory (not project or scratch60 ) they are not backed up and may be unrecoverable. Use the rm -rf command very carefully.","title":"Delete Files"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#buy-more-space","text":"If you would like to purchase more than the default quotas, we can help you buy space on the clusters .","title":"Buy More Space"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#rate-limits","text":"We rate-limit job submissions to 200 jobs per hour on each cluster. This limit helps even out load on the scheduler and encourages good practice. When you hit this limit, you will get an error when submitting new jobs that looks like this: sbatch: error: Reached jobs per hour limit sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits) You will then need to wait until your submission rate drops.","title":"Rate Limits"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#use-job-arrays","text":"To avoid hitting this limit and make large numbers of jobs more manageable, you should use Dead Simple Queue or job arrays . If you need help adapting your workflow to dsq or job arrays contact us .","title":"Use Job Arrays"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#software-modules","text":"We build and organize software modules on the cluster using toolchains . The major toolchains we use produce modules that end in foss-yearletter or intel-yearletter, e.g. foss-2018b or intel-2018a . If modules from different toolchains are loaded at the same time, the conflicts that arise often lead to errors or strange application behavior. Seeing either of the following messages is a sign that you are loading incompatible modules. The following have been reloaded with a version change: 1) FFTW/3.3.7-gompi-2018a => FFTW/3.3.8-gompi-2018b 2) GCC/6.4.0-2.28 => GCC/7.3.0-2.3.0 3) GCCcore/6.4.0 => GCCcore/7.3.0 ... or GCCcore/7.3.0 exists but could not be loaded as requested.","title":"Software Modules"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#match-or-purge-your-toolchains","text":"Where possible, only use one toolchain at a time. When you want to use software from muliple toolchains run module purge between running new module load commands. If your work requires a version of software that is not installed, contact us .","title":"Match or Purge Your Toolchains"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#conda-environments","text":"Conda environments provide a nice way to manage python and R packages and modules. Conda acieves this by setting functions and environment variables that point to your environment files when you run conda activate . Unlike modules , conda environments are not completely forwarded into a job; having a conda environment loaded when you submit a job doesn't forward it well into your job. You will likely see messages about missing packages and libraries you definitely installed into the environment you want to use in your job.","title":"Conda Environments"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#load-conda-environments-right-before-use","text":"To make sure that your environment is set up properly for interactive use, wait until you are on the host you plan to use your environment on. Then run conda activate my_env . To make sure batch jobs function properly, only submit jobs without an environment loaded ( conda deactivate before sbatch ). Make sure you load miniconda and your environment in the body of your batch submission script.","title":"Load Conda Environments Right Before Use"},{"location":"clusters-at-yale/job-scheduling/dependency/","text":"Jobs with Dependencies SLURM offers a tool which can help string jobs together via dependencies. When submitting a job, you can specify that it should wait to run until a specified job has finished. This provides a mechanism to create simple pipelines for managing complicated workflows. Simple Pipeline As a toy example, consider a two-step pipeline, first a data transfer followed by an analysis step. Here we will use the --dependency flag for sbatch and the afterok type that requires a job to finish successfully before starting the second step: The first step is controlled by a sbatch submission script called step1.sh : #!/bin/bash #SBATCH --job-name=DataTransfer #SBATCH -t 30:00 rsync -avP remote_host:/path/to/data.csv $HOME /project/ The second step is controlled by step2.sh : #!/bin/bash #SBATCH --job-name=DataProcess #SBATCH -t 5:00:00 module load miniconda source activate my_env python my_script.py $HOME /project/data.csv When we submit the first step (using the command sbatch step1.sh ) we obtain the jobid number for that job. We then submit the second step adding in the --dependency flag to tell Slurm that this job requires the first job to finish before it can start: sbatch --dependency = afterok:56761133 step2.sh When the 'transfer' job finishes successfully (without an error exit code) the 'processing' step will begin. While this is a simple dependency structure, it is possible to have multiple dependencies or more complicated structure. Job Clean-up One frequent use-case is a clean-up job that runs after all other jobs have finished. This is a common way to collect results from processing multiple files into a single output file. This can be done using the --dependency=singleton: flag that will wait until all previously launched jobs with the same name and user have finished. [ tl397@grace1 ~ ] $ squeue -u tl397 JOBID PARTITION NAME USER ST SUBMIT_TIME NODELIST ( REASON ) 12345670 day JobName tl397 R 2020 -05-27T11:54 c01n08 12345671 day JobName tl397 R 2020 -05-27T11:54 c01n08 ... 12345678 day JobName tl397 R 2020 -05-27T11:54 c01n08 12345679 day JobName tl397 R 2020 -05-27T11:54 c01n08 [ tl397@grace1 ~ ] $ sbatch --dependency = singleton --job-name = JobName cleanup.sh [ tl397@grace1 ~ ] $ squeue -u tl397 JOBID PARTITION NAME USER ST SUBMIT_TIME NODELIST ( REASON ) 12345670 day JobName tl397 R 2020 -05-27T11:54 c01n08 12345671 day JobName tl397 R 2020 -05-27T11:54 c01n08 ... 12345678 day JobName tl397 R 2020 -05-27T11:54 c01n08 12345679 day JobName tl397 R 2020 -05-27T11:54 c01n08 12345680 day JobName tl397 R 2020 -05-27T11:54 ( Dependency ) This last job will wait to run until all previous jobs with name JobName finish. Further Reading SLURM provides a number of options for logic controlling dependencies. Most common are the two discussed above, but --dependency=afternotok: can be useful to control behavior if a job fails. Full discussion of the options can be found on the SLURM manual page for sbatch (https://slurm.schedmd.com/sbatch.html). A very detailed overview, with examples in both bash and python, can also be found at the NIH computing reference: https://hpc.nih.gov/docs/job_dependencies.html.","title":"Jobs with Dependencies"},{"location":"clusters-at-yale/job-scheduling/dependency/#jobs-with-dependencies","text":"SLURM offers a tool which can help string jobs together via dependencies. When submitting a job, you can specify that it should wait to run until a specified job has finished. This provides a mechanism to create simple pipelines for managing complicated workflows.","title":"Jobs with Dependencies"},{"location":"clusters-at-yale/job-scheduling/dependency/#simple-pipeline","text":"As a toy example, consider a two-step pipeline, first a data transfer followed by an analysis step. Here we will use the --dependency flag for sbatch and the afterok type that requires a job to finish successfully before starting the second step: The first step is controlled by a sbatch submission script called step1.sh : #!/bin/bash #SBATCH --job-name=DataTransfer #SBATCH -t 30:00 rsync -avP remote_host:/path/to/data.csv $HOME /project/ The second step is controlled by step2.sh : #!/bin/bash #SBATCH --job-name=DataProcess #SBATCH -t 5:00:00 module load miniconda source activate my_env python my_script.py $HOME /project/data.csv When we submit the first step (using the command sbatch step1.sh ) we obtain the jobid number for that job. We then submit the second step adding in the --dependency flag to tell Slurm that this job requires the first job to finish before it can start: sbatch --dependency = afterok:56761133 step2.sh When the 'transfer' job finishes successfully (without an error exit code) the 'processing' step will begin. While this is a simple dependency structure, it is possible to have multiple dependencies or more complicated structure.","title":"Simple Pipeline"},{"location":"clusters-at-yale/job-scheduling/dependency/#job-clean-up","text":"One frequent use-case is a clean-up job that runs after all other jobs have finished. This is a common way to collect results from processing multiple files into a single output file. This can be done using the --dependency=singleton: flag that will wait until all previously launched jobs with the same name and user have finished. [ tl397@grace1 ~ ] $ squeue -u tl397 JOBID PARTITION NAME USER ST SUBMIT_TIME NODELIST ( REASON ) 12345670 day JobName tl397 R 2020 -05-27T11:54 c01n08 12345671 day JobName tl397 R 2020 -05-27T11:54 c01n08 ... 12345678 day JobName tl397 R 2020 -05-27T11:54 c01n08 12345679 day JobName tl397 R 2020 -05-27T11:54 c01n08 [ tl397@grace1 ~ ] $ sbatch --dependency = singleton --job-name = JobName cleanup.sh [ tl397@grace1 ~ ] $ squeue -u tl397 JOBID PARTITION NAME USER ST SUBMIT_TIME NODELIST ( REASON ) 12345670 day JobName tl397 R 2020 -05-27T11:54 c01n08 12345671 day JobName tl397 R 2020 -05-27T11:54 c01n08 ... 12345678 day JobName tl397 R 2020 -05-27T11:54 c01n08 12345679 day JobName tl397 R 2020 -05-27T11:54 c01n08 12345680 day JobName tl397 R 2020 -05-27T11:54 ( Dependency ) This last job will wait to run until all previous jobs with name JobName finish.","title":"Job Clean-up"},{"location":"clusters-at-yale/job-scheduling/dependency/#further-reading","text":"SLURM provides a number of options for logic controlling dependencies. Most common are the two discussed above, but --dependency=afternotok: can be useful to control behavior if a job fails. Full discussion of the options can be found on the SLURM manual page for sbatch (https://slurm.schedmd.com/sbatch.html). A very detailed overview, with examples in both bash and python, can also be found at the NIH computing reference: https://hpc.nih.gov/docs/job_dependencies.html.","title":"Further Reading"},{"location":"clusters-at-yale/job-scheduling/dsq/","text":"Job Arrays with dSQ Dead Simple Queue is a light-weight tool to help submit large batches of homogenous jobs to a Slurm -based HPC cluster. It wraps around slurm's sbatch to help you submit independent jobs as job arrays . Job arrays have several advantages over submitting your jobs in a loop: Your job array will grow during the run to use available resources, up to a limit you can set. Even if the cluster is busy, you probably get work done because each job from your array can be run independently. Your job will only use the resources needed to complete remaining jobs. It will shrink as your jobs finish, giving you and your peers better access to compute resources. If you run your array on a pre-emptable partition (scavenge on YCRC clusters), only individual jobs are preempted. Your whole array will continue. dSQ adds a few nice features on top of job arrays: Your jobs don't need to know they're running in an array; your job file is a great way to document what was done in a way that you can move to other systems relatively easily. You get a simple report of which job ran where and for how long dSQAutopsy can create a new job file that has only the jobs that didn't complete from your last run. All you need is Python 2.7+, or Python 3. dSQ is not recommended for situations where the initialization of the job takes most of its execution time and it is re-usable. These situations are much better handled by a worker-based job handler. Step 1: Create Your Job File First, you'll need to generate a job file. Each line of this job file needs to specify exactly what you want run for each job, including any modules that need to be loaded or modifications to your environment variables. Empty lines or lines that begin with # will be ignored when submitting your job array. Note: slurm jobs start in the directory from which your job was submitted. For example, imagine that you have 1000 fastq files that correspond to individual samples you want to map to a genome with bowtie2 and convert to bam files with samtools . Given some initial testing, you think that each job needs 4 GiB of RAM, and will run in less than 20 minutes. Create a file with the jobs you want to run, one per line. A simple loop that prints your jobs should usually suffice. A job can be a simple command invocation, or a sequence of commands. You can call the job file anything, but for this example assume it's called \"joblist.txt\" and contains: module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample1 --rg SM:sample1 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample1.fastq - | samtools view -Shu - | samtools sort - sample1 module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample2 --rg SM:sample2 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample2.fastq - | samtools view -Shu - | samtools sort - sample2 ... module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample1000 --rg SM:sample1000 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample1000.fastq - | samtools view -Shu - | samtools sort - sample1000 Avoid Very Short Jobs When building your job file, please bundle very short jobs (less than a minute) such that each element of the job array will run for at least 10 minutes. You can do this by putting multiple tasks on a single line, separated by a ; . In the same vein, avoid jobs that simply check for a previous successful completion and then exit. See dSQAutopsy below for a way to completely avoid submitting these types of jobs. Our clusters are not tuned for extremely high throughput jobs. Therefore, large numbers of very short jobs put a lot of strain on both the scheduler, resulting in delays in scheduling other users' jobs, and the storage, due to large numbers of I/O operations. Step 2: Generate Batch Script with dsq On YCRC clusters you can load Dead Simple Queue onto your path with: module load dSQ You can also download or clone this repo and use the scripts directly. dsq takes a few arguments, then writes a job submission script (default) or can directly submit a job for you. The resources you request will be given to each job in the array (each line in your job file) , e.g. requesting 2 GiB of RAM with dSQ will run each individual job with a separate 2 GiB of RAM available. Run sbatch --help or see the official Slurm documentation for more info on sbatch options. dSQ will set a default job name of dsq-jobfile (your job file name without the file extension). dSQ will also set the job output file name pattern to dsq-jobfile-%A_%a-%N.out, which will capture each of your jobs' output to a file with the job's ID(%A), its array index or zero-based line number(%a), and the host name of the node it ran on (%N). If you are handling output in each of your jobs, set this to /dev/null , which will stop these files from being created. Required Arguments: --job-file jobs.txt Job file, one self-contained job per line. Optional Arguments: -h, --help Show this help message and exit. --version show program's version number and exit --batch-file sub_script.sh Name for batch script file. Defaults to dsq-jobfile-YYYY-MM-DD.sh -J jobname, --job-name jobname Name of your job array. Defaults to dsq-jobfile --max-jobs number Maximum number of simultaneously running jobs from the job array. -o fmt_string, --output fmt_string Slurm output file pattern. There will be one file per line in your job file. To suppress slurm out files, set this to /dev/null. Defaults to dsq-jobfile-%A_%a-%N.out --status-dir dir Directory to save the job_jobid_status.tsv file to. Defaults to working directory. --suppress-stats-file Don't save job stats to job_jobid_status.tsv --submit Submit the job array on the fly instead of creating a submission script. In the example above, we want walltime of 20 minutes and memory=4GiB per job. Our invocation would be: dsq --job-file joblist.txt --mem-per-cpu 4g -t 20 :00 --mail-type ALL The dsq command will create a file called dsq-joblist-yyyy-mm-dd.sh , where the y, m, and d are today's date. After creating the batch script, take a look at its contents. You can further modify the Slurm directives in this file before submitting. #!/bin/bash #SBATCH --array 0-999 #SBATCH --output dsq-joblist-%A_%3a-%N.out #SBATCH --job-name dsq-joblist #SBATCH --mem-per-cpu 4g -t 10:00 --mail-type ALL # DO NOT EDIT LINE BELOW /path/to/dSQBatch.py --job-file /path/to/joblist.txt --status-dir /path/to/here Step 3: Submit Batch Script sbatch dsq-joblist-yyyy-mm-dd.sh Manage Your dSQ Job You can refer to any portion of your job with jobid_index syntax, or the entire array with its jobid. The index Dead Simple Queue uses starts at zero , so the 3rd line in your job file will have an index of 2. You can also specify ranges. # to cancel job 4 for array job 14567 scancel 14567_4 # to cancel jobs 10-20 for job 14567: scancel 14567_ [ 10 -20 ] dSQ Output You can monitor the status of your jobs in Slurm by using squeue -u , squeue -j , or dsqa -j . dSQ creates a file named job_jobid_status.tsv , unless you suppress this output with --supress-stats-file . This file will report the success or failure of each job as it finishes. Note this file will not contain information for any jobs that were canceled (e.g. by the user with scancel) before they began. This file contains details about the completed jobs in the following tab-separated columns: Job_ID: the zero-based line number from your job file. Exit_Code: exit code returned from your job (non-zero number generally indicates a failed job). Hostname: The hostname of the compute node that this job ran on. Time_Started: time started, formatted as year-month-day hour:minute:second. Time_Ended: time started, formatted as year-month-day hour:minute:second. Time_Elapsed: in seconds. Job: the line from your job file. dSQAutopsy You can use dSQAutopsy or dsqa to create a simple report of the array of jobs, and a new jobsfile that contains just the jobs you want to re-run if you specify the original jobsfile. Options listed below -j JOB_ID, --job-id JOB_ID The Job ID of a running or completed dSQ Array -f JOB_FILE, --job-file JOB_FILE Job file, one job per line (not your job submission script). -s STATES, --states STATES Comma separated list of states to use for re-writing job file. Default: CANCELLED,NODE_FAIL,PREEMPTED Asking for a simple report: dsqa -j 13233846 Produces one State Summary for Array 13233846 State Num_Jobs Indices ----- -------- ------- COMPLETED 12 4,7-17 RUNNING 5 1-3,5-6 PREEMPTED 1 0 You can redirect the report and the failed jobs to separate files: dsqa -j 2629186 -f jobsfile.txt > re-run_jobs.txt 2 > 2629186_report.txt","title":"Job Arrays with dSQ"},{"location":"clusters-at-yale/job-scheduling/dsq/#job-arrays-with-dsq","text":"Dead Simple Queue is a light-weight tool to help submit large batches of homogenous jobs to a Slurm -based HPC cluster. It wraps around slurm's sbatch to help you submit independent jobs as job arrays . Job arrays have several advantages over submitting your jobs in a loop: Your job array will grow during the run to use available resources, up to a limit you can set. Even if the cluster is busy, you probably get work done because each job from your array can be run independently. Your job will only use the resources needed to complete remaining jobs. It will shrink as your jobs finish, giving you and your peers better access to compute resources. If you run your array on a pre-emptable partition (scavenge on YCRC clusters), only individual jobs are preempted. Your whole array will continue. dSQ adds a few nice features on top of job arrays: Your jobs don't need to know they're running in an array; your job file is a great way to document what was done in a way that you can move to other systems relatively easily. You get a simple report of which job ran where and for how long dSQAutopsy can create a new job file that has only the jobs that didn't complete from your last run. All you need is Python 2.7+, or Python 3. dSQ is not recommended for situations where the initialization of the job takes most of its execution time and it is re-usable. These situations are much better handled by a worker-based job handler.","title":"Job Arrays with dSQ"},{"location":"clusters-at-yale/job-scheduling/dsq/#step-1-create-your-job-file","text":"First, you'll need to generate a job file. Each line of this job file needs to specify exactly what you want run for each job, including any modules that need to be loaded or modifications to your environment variables. Empty lines or lines that begin with # will be ignored when submitting your job array. Note: slurm jobs start in the directory from which your job was submitted. For example, imagine that you have 1000 fastq files that correspond to individual samples you want to map to a genome with bowtie2 and convert to bam files with samtools . Given some initial testing, you think that each job needs 4 GiB of RAM, and will run in less than 20 minutes. Create a file with the jobs you want to run, one per line. A simple loop that prints your jobs should usually suffice. A job can be a simple command invocation, or a sequence of commands. You can call the job file anything, but for this example assume it's called \"joblist.txt\" and contains: module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample1 --rg SM:sample1 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample1.fastq - | samtools view -Shu - | samtools sort - sample1 module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample2 --rg SM:sample2 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample2.fastq - | samtools view -Shu - | samtools sort - sample2 ... module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample1000 --rg SM:sample1000 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample1000.fastq - | samtools view -Shu - | samtools sort - sample1000 Avoid Very Short Jobs When building your job file, please bundle very short jobs (less than a minute) such that each element of the job array will run for at least 10 minutes. You can do this by putting multiple tasks on a single line, separated by a ; . In the same vein, avoid jobs that simply check for a previous successful completion and then exit. See dSQAutopsy below for a way to completely avoid submitting these types of jobs. Our clusters are not tuned for extremely high throughput jobs. Therefore, large numbers of very short jobs put a lot of strain on both the scheduler, resulting in delays in scheduling other users' jobs, and the storage, due to large numbers of I/O operations.","title":"Step 1: Create Your Job File"},{"location":"clusters-at-yale/job-scheduling/dsq/#step-2-generate-batch-script-with-dsq","text":"On YCRC clusters you can load Dead Simple Queue onto your path with: module load dSQ You can also download or clone this repo and use the scripts directly. dsq takes a few arguments, then writes a job submission script (default) or can directly submit a job for you. The resources you request will be given to each job in the array (each line in your job file) , e.g. requesting 2 GiB of RAM with dSQ will run each individual job with a separate 2 GiB of RAM available. Run sbatch --help or see the official Slurm documentation for more info on sbatch options. dSQ will set a default job name of dsq-jobfile (your job file name without the file extension). dSQ will also set the job output file name pattern to dsq-jobfile-%A_%a-%N.out, which will capture each of your jobs' output to a file with the job's ID(%A), its array index or zero-based line number(%a), and the host name of the node it ran on (%N). If you are handling output in each of your jobs, set this to /dev/null , which will stop these files from being created. Required Arguments: --job-file jobs.txt Job file, one self-contained job per line. Optional Arguments: -h, --help Show this help message and exit. --version show program's version number and exit --batch-file sub_script.sh Name for batch script file. Defaults to dsq-jobfile-YYYY-MM-DD.sh -J jobname, --job-name jobname Name of your job array. Defaults to dsq-jobfile --max-jobs number Maximum number of simultaneously running jobs from the job array. -o fmt_string, --output fmt_string Slurm output file pattern. There will be one file per line in your job file. To suppress slurm out files, set this to /dev/null. Defaults to dsq-jobfile-%A_%a-%N.out --status-dir dir Directory to save the job_jobid_status.tsv file to. Defaults to working directory. --suppress-stats-file Don't save job stats to job_jobid_status.tsv --submit Submit the job array on the fly instead of creating a submission script. In the example above, we want walltime of 20 minutes and memory=4GiB per job. Our invocation would be: dsq --job-file joblist.txt --mem-per-cpu 4g -t 20 :00 --mail-type ALL The dsq command will create a file called dsq-joblist-yyyy-mm-dd.sh , where the y, m, and d are today's date. After creating the batch script, take a look at its contents. You can further modify the Slurm directives in this file before submitting. #!/bin/bash #SBATCH --array 0-999 #SBATCH --output dsq-joblist-%A_%3a-%N.out #SBATCH --job-name dsq-joblist #SBATCH --mem-per-cpu 4g -t 10:00 --mail-type ALL # DO NOT EDIT LINE BELOW /path/to/dSQBatch.py --job-file /path/to/joblist.txt --status-dir /path/to/here","title":"Step 2: Generate Batch Script with dsq"},{"location":"clusters-at-yale/job-scheduling/dsq/#step-3-submit-batch-script","text":"sbatch dsq-joblist-yyyy-mm-dd.sh","title":"Step 3: Submit Batch Script"},{"location":"clusters-at-yale/job-scheduling/dsq/#manage-your-dsq-job","text":"You can refer to any portion of your job with jobid_index syntax, or the entire array with its jobid. The index Dead Simple Queue uses starts at zero , so the 3rd line in your job file will have an index of 2. You can also specify ranges. # to cancel job 4 for array job 14567 scancel 14567_4 # to cancel jobs 10-20 for job 14567: scancel 14567_ [ 10 -20 ]","title":"Manage Your dSQ Job"},{"location":"clusters-at-yale/job-scheduling/dsq/#dsq-output","text":"You can monitor the status of your jobs in Slurm by using squeue -u , squeue -j , or dsqa -j . dSQ creates a file named job_jobid_status.tsv , unless you suppress this output with --supress-stats-file . This file will report the success or failure of each job as it finishes. Note this file will not contain information for any jobs that were canceled (e.g. by the user with scancel) before they began. This file contains details about the completed jobs in the following tab-separated columns: Job_ID: the zero-based line number from your job file. Exit_Code: exit code returned from your job (non-zero number generally indicates a failed job). Hostname: The hostname of the compute node that this job ran on. Time_Started: time started, formatted as year-month-day hour:minute:second. Time_Ended: time started, formatted as year-month-day hour:minute:second. Time_Elapsed: in seconds. Job: the line from your job file.","title":"dSQ Output"},{"location":"clusters-at-yale/job-scheduling/dsq/#dsqautopsy","text":"You can use dSQAutopsy or dsqa to create a simple report of the array of jobs, and a new jobsfile that contains just the jobs you want to re-run if you specify the original jobsfile. Options listed below -j JOB_ID, --job-id JOB_ID The Job ID of a running or completed dSQ Array -f JOB_FILE, --job-file JOB_FILE Job file, one job per line (not your job submission script). -s STATES, --states STATES Comma separated list of states to use for re-writing job file. Default: CANCELLED,NODE_FAIL,PREEMPTED Asking for a simple report: dsqa -j 13233846 Produces one State Summary for Array 13233846 State Num_Jobs Indices ----- -------- ------- COMPLETED 12 4,7-17 RUNNING 5 1-3,5-6 PREEMPTED 1 0 You can redirect the report and the failed jobs to separate files: dsqa -j 2629186 -f jobsfile.txt > re-run_jobs.txt 2 > 2629186_report.txt","title":"dSQAutopsy"},{"location":"clusters-at-yale/job-scheduling/fairshare/","text":"Priority & Wait Time Job Priority Score Fairshare To ensure well-balanced access to cluster resources, we institute a fairshare system on our clusters. In practice this means jobs have a priority score that dictates when it can be run in relation to other jobs. This score is affected by the amount of CPU-equivalent hours used by a group in the past few weeks. The number of CPU-equivalents allocated to a job is defined as the larger of (a) the number of requested cores and (b) the total amount of requested memory divided by the default memory per core (usually 5G/core). If a group has used a large amount of CPU-equivalent hours, their jobs are given a lower priority score and therefore will take longer to start if the cluster is busy. Regardless of a job's prority, the scheduler still considers all jobs for backfill (see below). To see all pending jobs sorted by priority (jobs with higher priority at the top), use the following squeue command: squeue --sort=-p -t PD -p To monitor usage of members of your group, run the sshare command: sshare -a -A Note: Resources used on private partitions do not count affect fairshare. Similarly, resources used in the scavenge partition cost 10% of comparable resources in the other partitions. Length of Time in Queue In addition to fairshare, any pending job will accrue priority over time, which can help overcome small fairshare penalties. To see the factors affecting your job's priority, run the following sprio command: sprio -j Backfill In addition to the main scheduling cycle, where jobs are run in the order of priority and availability of resources, all jobs are also considered for \"backfill\". Backfill is a mechanism which will let jobs with lower priority score start before high priority jobs if they can fit in around them. For example, if a higher priority job needs 4 nodes with 20 cores on each node and it will have to wait 30 hours for those resources to be available, if a lower priority job only needs a couple cores for an hour, Slurm will run that job in the meantime. For this reason, it is important to request accurate walltime limits for your jobs. If your job only requires 2 hours to run, but you request 24 hours, the likelihood that your job will be backfilled is greatly lowered. Moreover, for performance reasons, the backfill scheduler on Grace only looks at the top 10 jobs by each user. Therefore, if you bundle similar jobs into job arrays (see dSQ ), the backfill cycle will consider more of your jobs since entire job arrays only count as one job for the limit accounting.","title":"Priority & Wait Time"},{"location":"clusters-at-yale/job-scheduling/fairshare/#priority-wait-time","text":"","title":"Priority & Wait Time"},{"location":"clusters-at-yale/job-scheduling/fairshare/#job-priority-score","text":"","title":"Job Priority Score"},{"location":"clusters-at-yale/job-scheduling/fairshare/#fairshare","text":"To ensure well-balanced access to cluster resources, we institute a fairshare system on our clusters. In practice this means jobs have a priority score that dictates when it can be run in relation to other jobs. This score is affected by the amount of CPU-equivalent hours used by a group in the past few weeks. The number of CPU-equivalents allocated to a job is defined as the larger of (a) the number of requested cores and (b) the total amount of requested memory divided by the default memory per core (usually 5G/core). If a group has used a large amount of CPU-equivalent hours, their jobs are given a lower priority score and therefore will take longer to start if the cluster is busy. Regardless of a job's prority, the scheduler still considers all jobs for backfill (see below). To see all pending jobs sorted by priority (jobs with higher priority at the top), use the following squeue command: squeue --sort=-p -t PD -p To monitor usage of members of your group, run the sshare command: sshare -a -A Note: Resources used on private partitions do not count affect fairshare. Similarly, resources used in the scavenge partition cost 10% of comparable resources in the other partitions.","title":"Fairshare"},{"location":"clusters-at-yale/job-scheduling/fairshare/#length-of-time-in-queue","text":"In addition to fairshare, any pending job will accrue priority over time, which can help overcome small fairshare penalties. To see the factors affecting your job's priority, run the following sprio command: sprio -j ","title":"Length of Time in Queue"},{"location":"clusters-at-yale/job-scheduling/fairshare/#backfill","text":"In addition to the main scheduling cycle, where jobs are run in the order of priority and availability of resources, all jobs are also considered for \"backfill\". Backfill is a mechanism which will let jobs with lower priority score start before high priority jobs if they can fit in around them. For example, if a higher priority job needs 4 nodes with 20 cores on each node and it will have to wait 30 hours for those resources to be available, if a lower priority job only needs a couple cores for an hour, Slurm will run that job in the meantime. For this reason, it is important to request accurate walltime limits for your jobs. If your job only requires 2 hours to run, but you request 24 hours, the likelihood that your job will be backfilled is greatly lowered. Moreover, for performance reasons, the backfill scheduler on Grace only looks at the top 10 jobs by each user. Therefore, if you bundle similar jobs into job arrays (see dSQ ), the backfill cycle will consider more of your jobs since entire job arrays only count as one job for the limit accounting.","title":"Backfill"},{"location":"clusters-at-yale/job-scheduling/mpi/","text":"MPI Partition Grace has a special common partition called mpi . The mpi partition is a bit different from other partitions on Grace--it always allocates entire nodes to jobs submitted to the partition. Each node in the mpi partition are identical 24 core, 2x Skylake Gold 6136, 96GiB RAM (90GiB usable) nodes. While this partition is available to all Grace users, only certain types of jobs are allowed on the partition (similar to the restrictions on our GPU partitions). In addition the the common partition mpi , there is a scavenge_mpi partition. This partition is has the same purpose and limitations as the regular mpi partition, but allows users to run a lower priority (e.g. subject to preemption if nodes are requested in the mpi partition ) without incurring cpu charges. Appropriate Jobs This partition is specifically designed to support jobs that use tightly-coupled MPI-enabled applications that will run across multiple nodes and are sensitive to sharing their nodes with other jobs. Since every node on the mpi partition is identical, it can support workloads that are sensitive to hardware difference across a single job. We expect most of jobs submitted to mpi to use all 24 cores on each node. There are occasionally instances where a tightly coupled application will use multiple nodes but less than all 24 cores due to load balancing or memory limitations. For example, some applications require power of 2 cores in the job, but 24 cores doesn't always divide evenly into those configurations. So we occasionally see jobs that use multiple nodes but only 16 of the 24 cores per node and are also acceptable submissions to the mpi partition. Jobs that do not require exclusive nodes, even if they use mpirun to launch, will run fine and experience normal wait times in the day and week (and scavenge) partitions. As such, we ask you to protect the special mpi partition nodes for the more resource sensitive jobs listed above and, therefore, submit any jobs that will not be using whole node(s) to the other partitions. If smaller or single core jobs are submitted to the mpi partition, they may be cancelled without warning. As with our GPU partitions, if you would like to make use of available cores on any mpi nodes for small jobs, the scavenge partition is the correct way to do that. If you have any questions about whether your workload is appropriate for the mpi partition, please contact us . Compilation There is one node in the devel partition that is identical to the mpi partition nodes. If you choose to compile your code with advanced optimization flags specific to the new generation of compute nodes, you can request that node in the devel partition with the -C skylake submission flag. Core Layouts Please review the Request Compute Resources documentation for the appropriate Slurm flags for different types of core and node layouts. If you have any questions, feel free to contact us .","title":"MPI Partition"},{"location":"clusters-at-yale/job-scheduling/mpi/#mpi-partition","text":"Grace has a special common partition called mpi . The mpi partition is a bit different from other partitions on Grace--it always allocates entire nodes to jobs submitted to the partition. Each node in the mpi partition are identical 24 core, 2x Skylake Gold 6136, 96GiB RAM (90GiB usable) nodes. While this partition is available to all Grace users, only certain types of jobs are allowed on the partition (similar to the restrictions on our GPU partitions). In addition the the common partition mpi , there is a scavenge_mpi partition. This partition is has the same purpose and limitations as the regular mpi partition, but allows users to run a lower priority (e.g. subject to preemption if nodes are requested in the mpi partition ) without incurring cpu charges.","title":"MPI Partition"},{"location":"clusters-at-yale/job-scheduling/mpi/#appropriate-jobs","text":"This partition is specifically designed to support jobs that use tightly-coupled MPI-enabled applications that will run across multiple nodes and are sensitive to sharing their nodes with other jobs. Since every node on the mpi partition is identical, it can support workloads that are sensitive to hardware difference across a single job. We expect most of jobs submitted to mpi to use all 24 cores on each node. There are occasionally instances where a tightly coupled application will use multiple nodes but less than all 24 cores due to load balancing or memory limitations. For example, some applications require power of 2 cores in the job, but 24 cores doesn't always divide evenly into those configurations. So we occasionally see jobs that use multiple nodes but only 16 of the 24 cores per node and are also acceptable submissions to the mpi partition. Jobs that do not require exclusive nodes, even if they use mpirun to launch, will run fine and experience normal wait times in the day and week (and scavenge) partitions. As such, we ask you to protect the special mpi partition nodes for the more resource sensitive jobs listed above and, therefore, submit any jobs that will not be using whole node(s) to the other partitions. If smaller or single core jobs are submitted to the mpi partition, they may be cancelled without warning. As with our GPU partitions, if you would like to make use of available cores on any mpi nodes for small jobs, the scavenge partition is the correct way to do that. If you have any questions about whether your workload is appropriate for the mpi partition, please contact us .","title":"Appropriate Jobs"},{"location":"clusters-at-yale/job-scheduling/mpi/#compilation","text":"There is one node in the devel partition that is identical to the mpi partition nodes. If you choose to compile your code with advanced optimization flags specific to the new generation of compute nodes, you can request that node in the devel partition with the -C skylake submission flag.","title":"Compilation"},{"location":"clusters-at-yale/job-scheduling/mpi/#core-layouts","text":"Please review the Request Compute Resources documentation for the appropriate Slurm flags for different types of core and node layouts. If you have any questions, feel free to contact us .","title":"Core Layouts"},{"location":"clusters-at-yale/job-scheduling/resource-requests/","text":"Request Compute Resources Request Cores and Nodes When running jobs with Slurm , you must be explicit about requesting CPU cores and nodes. See our page on monitoring usage for tips on verifying your jobs are using the resources you expect. The three options --nodes or -N , --ntasks or -n , and --cpus-per-task or -c can be a bit confusing at first but are necessary to understand for applications that use more than one CPU. Tip If your application references threads or cores but makes no mention of MPI, only use --cpus-per-task to request CPUs. You cannot request more cores than there are on a single compute node where your job runs. Multi-thread, Multi-process, and MPI The majority of applications in the world were written to use one or more cores on a single computer. Most can only use one core, and do not benefit from being given more cores. The best way to speed these applications up is to run many separate jobs at once, using Dead Simple Queue or job arrays . If an application is able to use multiple cores, it usually achieves this by either spawning threads and sharing memory (multi-threaded) or starting entire new processes (multi-process). Some applications are written to use the Message Passing Interface (MPI) standard to run across many compute nodes. This allows such applications to scale computation in a way not limited by the number of cores on a single node. MPI translates what Slurm calls tasks to separate workers or processes. Because each of these processes can communicate across compute nodes, Slurm does not constrain them to the same node by default. Though tasks can be distributed across nodes, Slurm will not split the CPUs allocated to individual tasks. For this reason a single task that has multiple CPUs allocated will always be on a single node. In some cases using --ntasks=4 (or -n 4 ) and --cpus-per-task=4 (or -c 4 ) achieves the same job allocation by luck, but you should only use --cpus-per-task when using non-MPI applications to guarantee that the CPUs you expect your program to use are all accessable. Some MPI programs are also multi-threaded, so each process can use multiple CPUs. Only these applications can use --ntasks and --cpus-per-task to run faster. MPI Applications For more control over how Slurm lays out your job, you can add the --nodes and --ntasks-per-node flags. --nodes specifies how many nodes to allocate to your job. Slurm will allocate your requested number of cores to a minimal number of nodes on the cluster, so it is likely if you request a small number of tasks that they will all be allocated on the same node. However, to ensure they are on the same node, set --nodes=1 (obviously this is contingent on the number of CPUs on your cluster's nodes and requesting too many may result in a job that will never run). Conversely, if you would like to ensure a specific layout, such as one task per node for memory, I/O or other reasons, you can also set --ntasks-per-node=1 . Note that the following must be true: ntasks-per-node * nodes >= ntasks Hybrid (MPI+OpenMP) Applications For the most predictable performance for hybrid applications, you will need to use all three of the --ntasks , --cpus-per-task , and --nodes flags, where --ntasks equals the number of MPI tasks, --cpus-per-task equals the number of OMP_NUM_THREADS and --nodes is the number of nodes required to fit --ntasks * --cpus-per-task . Request Memory (RAM) Slurm strictly enforces the memory your job can use. If you request 5GiB of memory for your job and the total used by all processes you launch hits that limit, some of your processes may die and you will get errors . Make sure you either request the right amount of memory per core on each node in your job with --mem-per-cpu or memory per node in your job with --mem . You can request more memory than you think you might need for an example job, then make note of its actual usage to better tune future requests for similar jobs. Request GPUs Some of our clusters have nodes that contain GPU co-processors. Please refer to the individual cluster pages regarding node configurations that include GPUs. There are several salloc / sbatch options that allow you to request GPUs and specify your job layout relative to the GPUs requested. Long Option Short Option Description --cpus-per-gpu Use instead of --cpus-per-task to specify number of CPUs per allocated GPU. --gpus -G Specify the total number of GPUs required for the job either with number or type:number. --gpus-per-node Specify the number of GPUs per node , either with number or type:number. New option similar to --gres=gpu . --gpus-per-task Specify the number of GPUs per task , either with number or type:number. --mem-per-gpu * Request system memory that scales per GPU. The --mem , --mem-per-cpu and --mem-per-gpu options are mutually exclusive --constraint -C Request a selection of GPU types (separate types with | ). This option requires the --gpus option for GPU selection. * The --mem-per-gpu flag does not currently work as intended, please do not use. Request memory using --mem or --mem-per-cpu in the meantime. In order for your job to be able to access gpus, you must submit your job to a partition that contains nodes with GPUs and request them - the default GPU request for jobs is to not request any . Some applications require double-precision capable GPUs. If yours does, see the next section for using \"features\" to request any node with compatible GPUs. The Slurm options --mem , --mem-per-gpu and --mem-per-cpu do not request memory on GPUs, sometimes called vRAM. Instead you are allocated the GPU(s) requested and all attached GPU memory for your jobs. Memory accessible on GPUs is limited by their model, and is also listed on each cluster page. Request Specific GPU Types If your job can only run on a subset of the GPU types available in the partition, you can request one or more specific types of GPUs. To request a specific type of GPU, use type:number notation. For example, to request an NVIDIA P100. sbatch --cpus-per-gpu=2 --gpus=p100:1 --time=6:00:00 --partition gpu my_gpu_job.sh To submit your job to a number of GPU options (such as NVIDIA P100, V100 or A100), use a combination of the constraint flag ( -C ) and the --gpus flag (with just a number). For the constraint flag , separate the different GPU type names with the pipe character ( | ). Your job will then start on a node with any of those GPU types. This is not guaranteed to work as expected if you are requesting multiple nodes. GPU type names can be found in the partition tables on each respective cluster page. sbatch -C \"p100|v100|a100\" --gpus=1 --time=6:00:00 --partition gpu my_gpu_job.sh Tip As with requesting multiple cores or multiple nodes, we strongly recommend that you test your jobs using the gpu_devel partition to make sure they can well utilize multiple GPUs before requesting them; allocating more GPUs does not speed up code that can only use one at a time. Here is an example interactive request that would allocate two GPUs and four CPUs for thirty minutes: salloc --cpus-per-gpu=2 --gpus=2 --time=30:00 --partition gpu_devel For more documentation on using GPUs on our clusters, please see GPUs and CUDA . Features and Constraints You may want to run programs that require specific hardware. To ensure your job runs on specific types of nodes, use the --constraint flag. You can use the processor codename (e.g. haswell ) or processor type (e.g. E5-2660_v3 ) to limit your job to specific node types. You can also specify an instruction set (e.g. avx512 ) to require that no matter what CPU your job runs on, it must understand at least these instructions. See the individual cluster pages for the exact tags for the different node types. Multiple requirements (\"AND\") are separated by a comma ( , ) and multiple options (\"OR\") should be separated by the pipe character ( | ). # run on a node with a haswell codenamed CPU (e.g. a E5-2660 v3) sbatch --constraint = haswell submit.sh # only run on nodes with E5-2660 v4 CPUs sbatch --constraint = E5-2660_v4 submit.sh We also have keyword features to help you constrain your jobs to certain categories of nodes. oldest : the oldest generation of node on the cluster. Use this constraint when compiling code if you wish to ensure it can run on any standard node on the cluster. nogpu : nodes without GPUs. standard : nodes without GPUs or extra memory. Useful for protecting special nodes in a private partition for jobs that can use the extra capabilities. singleprecision : nodes with single-precision only capable GPUs (e.g. GTX 1080s, RTX 2080s). doubleprecision : nodes with double-precision capable GPUs (e.g. K80s, P100s and V100s). GPU type (e.g. v100 ): nodes with a specific type of GPU. bigtmp : nodes with at least 1.5T of local storage in /tmp . Useful to ensure that your code will have sufficient space if it uses local storage (e.g. Gaussian's $GAUSS_SCRDIR ). Tip Use the command scontrol show node , replacing with the node's name you're interested in, to see more information about the node including its features.","title":"Request Compute Resources"},{"location":"clusters-at-yale/job-scheduling/resource-requests/#request-compute-resources","text":"","title":"Request Compute Resources"},{"location":"clusters-at-yale/job-scheduling/resource-requests/#request-cores-and-nodes","text":"When running jobs with Slurm , you must be explicit about requesting CPU cores and nodes. See our page on monitoring usage for tips on verifying your jobs are using the resources you expect. The three options --nodes or -N , --ntasks or -n , and --cpus-per-task or -c can be a bit confusing at first but are necessary to understand for applications that use more than one CPU. Tip If your application references threads or cores but makes no mention of MPI, only use --cpus-per-task to request CPUs. You cannot request more cores than there are on a single compute node where your job runs.","title":"Request Cores and Nodes"},{"location":"clusters-at-yale/job-scheduling/resource-requests/#multi-thread-multi-process-and-mpi","text":"The majority of applications in the world were written to use one or more cores on a single computer. Most can only use one core, and do not benefit from being given more cores. The best way to speed these applications up is to run many separate jobs at once, using Dead Simple Queue or job arrays . If an application is able to use multiple cores, it usually achieves this by either spawning threads and sharing memory (multi-threaded) or starting entire new processes (multi-process). Some applications are written to use the Message Passing Interface (MPI) standard to run across many compute nodes. This allows such applications to scale computation in a way not limited by the number of cores on a single node. MPI translates what Slurm calls tasks to separate workers or processes. Because each of these processes can communicate across compute nodes, Slurm does not constrain them to the same node by default. Though tasks can be distributed across nodes, Slurm will not split the CPUs allocated to individual tasks. For this reason a single task that has multiple CPUs allocated will always be on a single node. In some cases using --ntasks=4 (or -n 4 ) and --cpus-per-task=4 (or -c 4 ) achieves the same job allocation by luck, but you should only use --cpus-per-task when using non-MPI applications to guarantee that the CPUs you expect your program to use are all accessable. Some MPI programs are also multi-threaded, so each process can use multiple CPUs. Only these applications can use --ntasks and --cpus-per-task to run faster.","title":"Multi-thread, Multi-process, and MPI"},{"location":"clusters-at-yale/job-scheduling/resource-requests/#mpi-applications","text":"For more control over how Slurm lays out your job, you can add the --nodes and --ntasks-per-node flags. --nodes specifies how many nodes to allocate to your job. Slurm will allocate your requested number of cores to a minimal number of nodes on the cluster, so it is likely if you request a small number of tasks that they will all be allocated on the same node. However, to ensure they are on the same node, set --nodes=1 (obviously this is contingent on the number of CPUs on your cluster's nodes and requesting too many may result in a job that will never run). Conversely, if you would like to ensure a specific layout, such as one task per node for memory, I/O or other reasons, you can also set --ntasks-per-node=1 . Note that the following must be true: ntasks-per-node * nodes >= ntasks","title":"MPI Applications"},{"location":"clusters-at-yale/job-scheduling/resource-requests/#hybrid-mpiopenmp-applications","text":"For the most predictable performance for hybrid applications, you will need to use all three of the --ntasks , --cpus-per-task , and --nodes flags, where --ntasks equals the number of MPI tasks, --cpus-per-task equals the number of OMP_NUM_THREADS and --nodes is the number of nodes required to fit --ntasks * --cpus-per-task .","title":"Hybrid (MPI+OpenMP) Applications"},{"location":"clusters-at-yale/job-scheduling/resource-requests/#request-memory-ram","text":"Slurm strictly enforces the memory your job can use. If you request 5GiB of memory for your job and the total used by all processes you launch hits that limit, some of your processes may die and you will get errors . Make sure you either request the right amount of memory per core on each node in your job with --mem-per-cpu or memory per node in your job with --mem . You can request more memory than you think you might need for an example job, then make note of its actual usage to better tune future requests for similar jobs.","title":"Request Memory (RAM)"},{"location":"clusters-at-yale/job-scheduling/resource-requests/#request-gpus","text":"Some of our clusters have nodes that contain GPU co-processors. Please refer to the individual cluster pages regarding node configurations that include GPUs. There are several salloc / sbatch options that allow you to request GPUs and specify your job layout relative to the GPUs requested. Long Option Short Option Description --cpus-per-gpu Use instead of --cpus-per-task to specify number of CPUs per allocated GPU. --gpus -G Specify the total number of GPUs required for the job either with number or type:number. --gpus-per-node Specify the number of GPUs per node , either with number or type:number. New option similar to --gres=gpu . --gpus-per-task Specify the number of GPUs per task , either with number or type:number. --mem-per-gpu * Request system memory that scales per GPU. The --mem , --mem-per-cpu and --mem-per-gpu options are mutually exclusive --constraint -C Request a selection of GPU types (separate types with | ). This option requires the --gpus option for GPU selection. * The --mem-per-gpu flag does not currently work as intended, please do not use. Request memory using --mem or --mem-per-cpu in the meantime. In order for your job to be able to access gpus, you must submit your job to a partition that contains nodes with GPUs and request them - the default GPU request for jobs is to not request any . Some applications require double-precision capable GPUs. If yours does, see the next section for using \"features\" to request any node with compatible GPUs. The Slurm options --mem , --mem-per-gpu and --mem-per-cpu do not request memory on GPUs, sometimes called vRAM. Instead you are allocated the GPU(s) requested and all attached GPU memory for your jobs. Memory accessible on GPUs is limited by their model, and is also listed on each cluster page.","title":"Request GPUs"},{"location":"clusters-at-yale/job-scheduling/resource-requests/#request-specific-gpu-types","text":"If your job can only run on a subset of the GPU types available in the partition, you can request one or more specific types of GPUs. To request a specific type of GPU, use type:number notation. For example, to request an NVIDIA P100. sbatch --cpus-per-gpu=2 --gpus=p100:1 --time=6:00:00 --partition gpu my_gpu_job.sh To submit your job to a number of GPU options (such as NVIDIA P100, V100 or A100), use a combination of the constraint flag ( -C ) and the --gpus flag (with just a number). For the constraint flag , separate the different GPU type names with the pipe character ( | ). Your job will then start on a node with any of those GPU types. This is not guaranteed to work as expected if you are requesting multiple nodes. GPU type names can be found in the partition tables on each respective cluster page. sbatch -C \"p100|v100|a100\" --gpus=1 --time=6:00:00 --partition gpu my_gpu_job.sh Tip As with requesting multiple cores or multiple nodes, we strongly recommend that you test your jobs using the gpu_devel partition to make sure they can well utilize multiple GPUs before requesting them; allocating more GPUs does not speed up code that can only use one at a time. Here is an example interactive request that would allocate two GPUs and four CPUs for thirty minutes: salloc --cpus-per-gpu=2 --gpus=2 --time=30:00 --partition gpu_devel For more documentation on using GPUs on our clusters, please see GPUs and CUDA .","title":"Request Specific GPU Types"},{"location":"clusters-at-yale/job-scheduling/resource-requests/#features-and-constraints","text":"You may want to run programs that require specific hardware. To ensure your job runs on specific types of nodes, use the --constraint flag. You can use the processor codename (e.g. haswell ) or processor type (e.g. E5-2660_v3 ) to limit your job to specific node types. You can also specify an instruction set (e.g. avx512 ) to require that no matter what CPU your job runs on, it must understand at least these instructions. See the individual cluster pages for the exact tags for the different node types. Multiple requirements (\"AND\") are separated by a comma ( , ) and multiple options (\"OR\") should be separated by the pipe character ( | ). # run on a node with a haswell codenamed CPU (e.g. a E5-2660 v3) sbatch --constraint = haswell submit.sh # only run on nodes with E5-2660 v4 CPUs sbatch --constraint = E5-2660_v4 submit.sh We also have keyword features to help you constrain your jobs to certain categories of nodes. oldest : the oldest generation of node on the cluster. Use this constraint when compiling code if you wish to ensure it can run on any standard node on the cluster. nogpu : nodes without GPUs. standard : nodes without GPUs or extra memory. Useful for protecting special nodes in a private partition for jobs that can use the extra capabilities. singleprecision : nodes with single-precision only capable GPUs (e.g. GTX 1080s, RTX 2080s). doubleprecision : nodes with double-precision capable GPUs (e.g. K80s, P100s and V100s). GPU type (e.g. v100 ): nodes with a specific type of GPU. bigtmp : nodes with at least 1.5T of local storage in /tmp . Useful to ensure that your code will have sufficient space if it uses local storage (e.g. Gaussian's $GAUSS_SCRDIR ). Tip Use the command scontrol show node , replacing with the node's name you're interested in, to see more information about the node including its features.","title":"Features and Constraints"},{"location":"clusters-at-yale/job-scheduling/resource-usage/","text":"Monitor CPU and Memory General Note Making sure your jobs use the right amount of RAM and the right number of CPUs helps you and others using the clusters use these resources more effeciently, and in turn get work done more quickly. Below are some examples of how to measure your CPU and RAM (aka memory) usage so you can make this happen. Be sure to check the Slurm documentation and the clusters page (especially the partitions and hardware sections) to make sure you are submitting the right jobs to the right hardware. Future Jobs If you launch a program by putting /usr/bin/time in front of it, time will watch your program and provide statistics about the resources it used. For example: [ netid@node ~ ] $ /usr/bin/time -v stress-ng --cpu 8 --timeout 10s stress-ng: info: [ 32574 ] dispatching hogs: 8 cpu stress-ng: info: [ 32574 ] successful run completed in 10 .08s Command being timed: \"stress-ng --cpu 8 --timeout 10s\" User time ( seconds ) : 80 .22 System time ( seconds ) : 0 .04 Percent of CPU this job got: 795 % Elapsed ( wall clock ) time ( h:mm:ss or m:ss ) : 0 :10.09 Average shared text size ( kbytes ) : 0 Average unshared data size ( kbytes ) : 0 Average stack size ( kbytes ) : 0 Average total size ( kbytes ) : 0 Maximum resident set size ( kbytes ) : 6328 Average resident set size ( kbytes ) : 0 Major ( requiring I/O ) page faults: 0 Minor ( reclaiming a frame ) page faults: 30799 Voluntary context switches: 1380 Involuntary context switches: 68 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 To know how much RAM your job used (and what jobs like it will need in the future), look at the \"Maximum resident set size\" Running Jobs If your job is already running, you can check on its usage, but will have to wait until it has finished to find the maximum memory and CPU used. The easiest way to check the instantaneous memory and CPU usage of a job is to ssh to a compute node your job is running on. To find the node you should ssh to, run: [netid@node ~]$ squeue --me JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 21252409 general 12345 netid R 32:17 17 c13n[02-04],c14n[05-10],c16n[03-10] Then use ssh to connect to a node your job is running on from the NODELIST column: [netid@node ~]$ ssh c13n03 [netid@c13n03 ~]$ Once you are on the compute node, run either ps or top . ps ps will give you instantaneous usage every time you run it. Here is some sample ps output: [netid@bigmem01 ~]$ ps -u$USER -o %cpu,rss,args %CPU RSS COMMAND 92.6 79446140 /gpfs/ysm/apps/hpc/Apps/Matlab/R2016b/bin/glnxa64/MATLAB -dmlworker -nodisplay -r distcomp_evaluate_filetask 94.5 80758040 /gpfs/ysm/apps/hpc/Apps/Matlab/R2016b/bin/glnxa64/MATLAB -dmlworker -nodisplay -r distcomp_evaluate_filetask 92.6 79676460 /gpfs/ysm/apps/hpc/Apps/Matlab/R2016b/bin/glnxa64/MATLAB -dmlworker -nodisplay -r distcomp_evaluate_filetask 92.5 81243364 /gpfs/ysm/apps/hpc/Apps/Matlab/R2016b/bin/glnxa64/MATLAB -dmlworker -nodisplay -r distcomp_evaluate_filetask 93.8 80799668 /gpfs/ysm/apps/hpc/Apps/Matlab/R2016b/bin/glnxa64/MATLAB -dmlworker -nodisplay -r distcomp_evaluate_filetask ps reports memory used in kilobytes, so each of the 5 matlab processes is using ~77GiB of RAM. They are also using most of 5 cores, so future jobs like this should request 5 CPUs. top top runs interactively and shows you live usage statistics. You can press u , enter your netid, then enter to filter just your processes. For Memory usage, the number you are interested in is RES. In the case below, the YEPNEE.exe programs are each consuming ~600MB of memory and each fully utilizing one CPU. You can press ? for help and q to quit. ClusterShell For multi-node jobs clush can be very useful. Please see our guide on how to set up and use ClusterShell . Completed Jobs Slurm records statistics for every job, including how much memory and CPU was used. seff After the job completes, you can run seff to get some useful information about your job, including the memory used and what percent of your allocated memory that amounts to. [netid@node ~]$ seff 21294645 Job ID: 21294645 Cluster: mccleary User/Group: rdb9/support State: COMPLETED (exit code 0) Cores: 1 CPU Utilized: 00:15:55 CPU Efficiency: 17.04% of 01:33:23 core-walltime Job Wall-clock time: 01:33:23 Memory Utilized: 446.20 MB Memory Efficiency: 8.71% of 5.00 GiB seff-array For job arrays (see here for details) it is helpful to look at statistics for how resources are used by each element of the array. The seff-array tool takes the job ID of the array and then calculates the distribution and average CPU and memory usage: [netid@node ~]$ seff-array 43283382 ========== Max Memory Usage ========== # NumSamples = 90; Min = 896.29 MB; Max = 900.48 MB # Mean = 897.77 MB; Variance = 0.40 MB; SD = 0.63 MB; Median 897.78 MB # each \u220e represents a count of 1 806.6628 - 896.7108 MB [ 2]: \u220e\u220e 896.7108 - 897.1296 MB [ 9]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 897.1296 - 897.5484 MB [ 21]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 897.5484 - 897.9672 MB [ 34]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 897.9672 - 898.3860 MB [ 15]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 898.3860 - 898.8048 MB [ 4]: \u220e\u220e\u220e\u220e 898.8048 - 899.2236 MB [ 1]: \u220e 899.2236 - 899.6424 MB [ 3]: \u220e\u220e\u220e 899.6424 - 900.0612 MB [ 0]: 900.0612 - 900.4800 MB [ 1]: \u220e The requested memory was 2000MB. ========== Elapsed Time ========== # NumSamples = 90; Min = 00:03:25.0; Max = 00:07:24.0 # Mean = 00:05:45.0; SD = 00:01:39.0; Median 00:06:44.0 # each \u220e represents a count of 1 00:03:5.0 - 00:03:48.0 [ 30]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 00:03:48.0 - 00:04:11.0 [ 0]: 00:04:11.0 - 00:04:34.0 [ 0]: 00:04:34.0 - 00:04:57.0 [ 0]: 00:04:57.0 - 00:05:20.0 [ 0]: 00:05:20.0 - 00:05:43.0 [ 0]: 00:05:43.0 - 00:06:6.0 [ 0]: 00:06:6.0 - 00:06:29.0 [ 0]: 00:06:29.0 - 00:06:52.0 [ 30]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 00:06:52.0 - 00:07:15.0 [ 28]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e ******************************************************************************** The requested runtime was 01:00:00. The average runtime was 00:05:45.0. Requesting less time would allow jobs to run more quickly. ******************************************************************************** This shows how efficiently the resource request was for all the jobs in an array. In this example, we see that the average memory usage was just under 1GiB, which is reasonable for the 2GiB requested. However, the requested runtime was for an hour, while the jobs only ran for six minutes. These jobs could have been scheduled more quickly if a more accurate runtime was specified. sacct You can also use the more flexible sacct to get that info, along with other more advanced job queries. Unfortunately, the default output from sacct is not as useful. We recommend setting an environment variable to customize the output. [netid@node ~]$ export SACCT_FORMAT=\"JobID%20,JobName,User,Partition,NodeList,Elapsed,State,ExitCode,MaxRSS,AllocTRES%32\" [netid@node ~]$ sacct -j 21294645 JobID JobName User Partition NodeList Elapsed State ExitCode MaxRSS AllocTRES -------------------- ---------- --------- ---------- --------------- ---------- ---------- -------- ---------- -------------------------------- 21294645 bash rdb9 interacti+ c06n09 01:33:23 COMPLETED 0:0 cpu=1,mem=5G,node=1,billing=1 21294645.extern extern c06n09 01:33:23 COMPLETED 0:0 716K cpu=1,mem=5G,node=1,billing=1 21294645.0 bash c06n09 01:33:23 COMPLETED 0:0 456908K cpu=1,mem=5G,node=1 You should look at the MaxRSS value to see your memory usage.","title":"Monitor CPU and Memory"},{"location":"clusters-at-yale/job-scheduling/resource-usage/#monitor-cpu-and-memory","text":"","title":"Monitor CPU and Memory"},{"location":"clusters-at-yale/job-scheduling/resource-usage/#general-note","text":"Making sure your jobs use the right amount of RAM and the right number of CPUs helps you and others using the clusters use these resources more effeciently, and in turn get work done more quickly. Below are some examples of how to measure your CPU and RAM (aka memory) usage so you can make this happen. Be sure to check the Slurm documentation and the clusters page (especially the partitions and hardware sections) to make sure you are submitting the right jobs to the right hardware.","title":"General Note"},{"location":"clusters-at-yale/job-scheduling/resource-usage/#future-jobs","text":"If you launch a program by putting /usr/bin/time in front of it, time will watch your program and provide statistics about the resources it used. For example: [ netid@node ~ ] $ /usr/bin/time -v stress-ng --cpu 8 --timeout 10s stress-ng: info: [ 32574 ] dispatching hogs: 8 cpu stress-ng: info: [ 32574 ] successful run completed in 10 .08s Command being timed: \"stress-ng --cpu 8 --timeout 10s\" User time ( seconds ) : 80 .22 System time ( seconds ) : 0 .04 Percent of CPU this job got: 795 % Elapsed ( wall clock ) time ( h:mm:ss or m:ss ) : 0 :10.09 Average shared text size ( kbytes ) : 0 Average unshared data size ( kbytes ) : 0 Average stack size ( kbytes ) : 0 Average total size ( kbytes ) : 0 Maximum resident set size ( kbytes ) : 6328 Average resident set size ( kbytes ) : 0 Major ( requiring I/O ) page faults: 0 Minor ( reclaiming a frame ) page faults: 30799 Voluntary context switches: 1380 Involuntary context switches: 68 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 To know how much RAM your job used (and what jobs like it will need in the future), look at the \"Maximum resident set size\"","title":"Future Jobs"},{"location":"clusters-at-yale/job-scheduling/resource-usage/#running-jobs","text":"If your job is already running, you can check on its usage, but will have to wait until it has finished to find the maximum memory and CPU used. The easiest way to check the instantaneous memory and CPU usage of a job is to ssh to a compute node your job is running on. To find the node you should ssh to, run: [netid@node ~]$ squeue --me JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 21252409 general 12345 netid R 32:17 17 c13n[02-04],c14n[05-10],c16n[03-10] Then use ssh to connect to a node your job is running on from the NODELIST column: [netid@node ~]$ ssh c13n03 [netid@c13n03 ~]$ Once you are on the compute node, run either ps or top .","title":"Running Jobs"},{"location":"clusters-at-yale/job-scheduling/resource-usage/#ps","text":"ps will give you instantaneous usage every time you run it. Here is some sample ps output: [netid@bigmem01 ~]$ ps -u$USER -o %cpu,rss,args %CPU RSS COMMAND 92.6 79446140 /gpfs/ysm/apps/hpc/Apps/Matlab/R2016b/bin/glnxa64/MATLAB -dmlworker -nodisplay -r distcomp_evaluate_filetask 94.5 80758040 /gpfs/ysm/apps/hpc/Apps/Matlab/R2016b/bin/glnxa64/MATLAB -dmlworker -nodisplay -r distcomp_evaluate_filetask 92.6 79676460 /gpfs/ysm/apps/hpc/Apps/Matlab/R2016b/bin/glnxa64/MATLAB -dmlworker -nodisplay -r distcomp_evaluate_filetask 92.5 81243364 /gpfs/ysm/apps/hpc/Apps/Matlab/R2016b/bin/glnxa64/MATLAB -dmlworker -nodisplay -r distcomp_evaluate_filetask 93.8 80799668 /gpfs/ysm/apps/hpc/Apps/Matlab/R2016b/bin/glnxa64/MATLAB -dmlworker -nodisplay -r distcomp_evaluate_filetask ps reports memory used in kilobytes, so each of the 5 matlab processes is using ~77GiB of RAM. They are also using most of 5 cores, so future jobs like this should request 5 CPUs.","title":"ps"},{"location":"clusters-at-yale/job-scheduling/resource-usage/#top","text":"top runs interactively and shows you live usage statistics. You can press u , enter your netid, then enter to filter just your processes. For Memory usage, the number you are interested in is RES. In the case below, the YEPNEE.exe programs are each consuming ~600MB of memory and each fully utilizing one CPU. You can press ? for help and q to quit.","title":"top"},{"location":"clusters-at-yale/job-scheduling/resource-usage/#clustershell","text":"For multi-node jobs clush can be very useful. Please see our guide on how to set up and use ClusterShell .","title":"ClusterShell"},{"location":"clusters-at-yale/job-scheduling/resource-usage/#completed-jobs","text":"Slurm records statistics for every job, including how much memory and CPU was used.","title":"Completed Jobs"},{"location":"clusters-at-yale/job-scheduling/resource-usage/#seff","text":"After the job completes, you can run seff to get some useful information about your job, including the memory used and what percent of your allocated memory that amounts to. [netid@node ~]$ seff 21294645 Job ID: 21294645 Cluster: mccleary User/Group: rdb9/support State: COMPLETED (exit code 0) Cores: 1 CPU Utilized: 00:15:55 CPU Efficiency: 17.04% of 01:33:23 core-walltime Job Wall-clock time: 01:33:23 Memory Utilized: 446.20 MB Memory Efficiency: 8.71% of 5.00 GiB","title":"seff"},{"location":"clusters-at-yale/job-scheduling/resource-usage/#seff-array","text":"For job arrays (see here for details) it is helpful to look at statistics for how resources are used by each element of the array. The seff-array tool takes the job ID of the array and then calculates the distribution and average CPU and memory usage: [netid@node ~]$ seff-array 43283382 ========== Max Memory Usage ========== # NumSamples = 90; Min = 896.29 MB; Max = 900.48 MB # Mean = 897.77 MB; Variance = 0.40 MB; SD = 0.63 MB; Median 897.78 MB # each \u220e represents a count of 1 806.6628 - 896.7108 MB [ 2]: \u220e\u220e 896.7108 - 897.1296 MB [ 9]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 897.1296 - 897.5484 MB [ 21]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 897.5484 - 897.9672 MB [ 34]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 897.9672 - 898.3860 MB [ 15]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 898.3860 - 898.8048 MB [ 4]: \u220e\u220e\u220e\u220e 898.8048 - 899.2236 MB [ 1]: \u220e 899.2236 - 899.6424 MB [ 3]: \u220e\u220e\u220e 899.6424 - 900.0612 MB [ 0]: 900.0612 - 900.4800 MB [ 1]: \u220e The requested memory was 2000MB. ========== Elapsed Time ========== # NumSamples = 90; Min = 00:03:25.0; Max = 00:07:24.0 # Mean = 00:05:45.0; SD = 00:01:39.0; Median 00:06:44.0 # each \u220e represents a count of 1 00:03:5.0 - 00:03:48.0 [ 30]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 00:03:48.0 - 00:04:11.0 [ 0]: 00:04:11.0 - 00:04:34.0 [ 0]: 00:04:34.0 - 00:04:57.0 [ 0]: 00:04:57.0 - 00:05:20.0 [ 0]: 00:05:20.0 - 00:05:43.0 [ 0]: 00:05:43.0 - 00:06:6.0 [ 0]: 00:06:6.0 - 00:06:29.0 [ 0]: 00:06:29.0 - 00:06:52.0 [ 30]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 00:06:52.0 - 00:07:15.0 [ 28]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e ******************************************************************************** The requested runtime was 01:00:00. The average runtime was 00:05:45.0. Requesting less time would allow jobs to run more quickly. ******************************************************************************** This shows how efficiently the resource request was for all the jobs in an array. In this example, we see that the average memory usage was just under 1GiB, which is reasonable for the 2GiB requested. However, the requested runtime was for an hour, while the jobs only ran for six minutes. These jobs could have been scheduled more quickly if a more accurate runtime was specified.","title":"seff-array"},{"location":"clusters-at-yale/job-scheduling/resource-usage/#sacct","text":"You can also use the more flexible sacct to get that info, along with other more advanced job queries. Unfortunately, the default output from sacct is not as useful. We recommend setting an environment variable to customize the output. [netid@node ~]$ export SACCT_FORMAT=\"JobID%20,JobName,User,Partition,NodeList,Elapsed,State,ExitCode,MaxRSS,AllocTRES%32\" [netid@node ~]$ sacct -j 21294645 JobID JobName User Partition NodeList Elapsed State ExitCode MaxRSS AllocTRES -------------------- ---------- --------- ---------- --------------- ---------- ---------- -------- ---------- -------------------------------- 21294645 bash rdb9 interacti+ c06n09 01:33:23 COMPLETED 0:0 cpu=1,mem=5G,node=1,billing=1 21294645.extern extern c06n09 01:33:23 COMPLETED 0:0 716K cpu=1,mem=5G,node=1,billing=1 21294645.0 bash c06n09 01:33:23 COMPLETED 0:0 456908K cpu=1,mem=5G,node=1 You should look at the MaxRSS value to see your memory usage.","title":"sacct"},{"location":"clusters-at-yale/job-scheduling/scavenge/","text":"Scavenge Partition A scavenge partition is available on all of our clusters. It allows you to (a) run jobs outside of your normal limits (e.g. QOSMaxCpuPerUserLimit ) and (b) use unutilized cores, if available, in any private partition on the cluster. You can also use the scavenge partition to get access to unused cores in special purpose partitions, such as the \"gpu\" or \"mpi\" partitions, and unused GPUs in private partitions. However, any job running in the scavenge partition is subject to preemption if any node in use by the job is required for a job in the node's normal partition. This means that your job may be killed without advance notice, so you should only run jobs in the scavenge partition that either have checkpoint capabilities or that can otherwise be restarted with minimal loss of progress. Warning Not all jobs are a good fit for the scavenge partition, such as jobs with long startup times or jobs that run a long time between checkpoint operations. Automatically Requeue Preempted Jobs If you would like your job to be automatically added back to the queue if preempted, you can add the --requeue flag to your submission script. #SBATCH --requeue Be aware that your job, when started from a requeue, will still re-run the entire original submission script. It will only resume progress if your program has the its own ability to checkpoint and restart from previous progress. Track History of a Requeued Job When a scavenge job is requeued after preemption, it retains the same job id. However, this can make it difficult to track the history of the job (how many times it was requeued, how long it ran for each time). To view the full history of your job use the --duplicates flag for the sacct command. sacct -j --duplicates Scavenge GPUs On Grace and McCleary, we also have a scavenge_gpu partition, that contains all scavenge-able GPU enabled nodes and has higher priority for those node than normal scavenge. In all other ways (e.g. preemption, time limit), scavenge_gpu behaves the same as the normal scavenge partition. You can see the full count of GPU nodes in the Partition tables on the respective cluster pages. Scavenge MPI Nodes On Grace, we have a scavenge_mpi partition, that contains all scavenge-able nodes similar to the mpi partition and has higher priority for those node than normal scavenge. scavenge_mpi is subject to the same preemption model as scavenge and the same use case restrictions as the regular mpi partition (multi-node, tightly couple parallel codes). You can see the full count of MPI nodes in the Partition tables on the respective cluster pages. Research Available Nodes If you are interested in specific hardware and its availability, you can use the sinfo command to query how many of each type of node is available and what features it lists. For example: sinfo -e -o \"%.6D|%c|%G|%b\" | column -ts \"|\" will show you the kinds of nodes available, and sinfo -e -o \"%.6D|%T|%c|%G|%b\" | column -ts \"|\" will break out how many nodes in each state (e.g. allocated, mixed, idle) there are. For more options see the official sinfo documentation .","title":"Scavenge Partition"},{"location":"clusters-at-yale/job-scheduling/scavenge/#scavenge-partition","text":"A scavenge partition is available on all of our clusters. It allows you to (a) run jobs outside of your normal limits (e.g. QOSMaxCpuPerUserLimit ) and (b) use unutilized cores, if available, in any private partition on the cluster. You can also use the scavenge partition to get access to unused cores in special purpose partitions, such as the \"gpu\" or \"mpi\" partitions, and unused GPUs in private partitions. However, any job running in the scavenge partition is subject to preemption if any node in use by the job is required for a job in the node's normal partition. This means that your job may be killed without advance notice, so you should only run jobs in the scavenge partition that either have checkpoint capabilities or that can otherwise be restarted with minimal loss of progress. Warning Not all jobs are a good fit for the scavenge partition, such as jobs with long startup times or jobs that run a long time between checkpoint operations.","title":"Scavenge Partition"},{"location":"clusters-at-yale/job-scheduling/scavenge/#automatically-requeue-preempted-jobs","text":"If you would like your job to be automatically added back to the queue if preempted, you can add the --requeue flag to your submission script. #SBATCH --requeue Be aware that your job, when started from a requeue, will still re-run the entire original submission script. It will only resume progress if your program has the its own ability to checkpoint and restart from previous progress.","title":"Automatically Requeue Preempted Jobs"},{"location":"clusters-at-yale/job-scheduling/scavenge/#track-history-of-a-requeued-job","text":"When a scavenge job is requeued after preemption, it retains the same job id. However, this can make it difficult to track the history of the job (how many times it was requeued, how long it ran for each time). To view the full history of your job use the --duplicates flag for the sacct command. sacct -j --duplicates","title":"Track History of a Requeued Job"},{"location":"clusters-at-yale/job-scheduling/scavenge/#scavenge-gpus","text":"On Grace and McCleary, we also have a scavenge_gpu partition, that contains all scavenge-able GPU enabled nodes and has higher priority for those node than normal scavenge. In all other ways (e.g. preemption, time limit), scavenge_gpu behaves the same as the normal scavenge partition. You can see the full count of GPU nodes in the Partition tables on the respective cluster pages.","title":"Scavenge GPUs"},{"location":"clusters-at-yale/job-scheduling/scavenge/#scavenge-mpi-nodes","text":"On Grace, we have a scavenge_mpi partition, that contains all scavenge-able nodes similar to the mpi partition and has higher priority for those node than normal scavenge. scavenge_mpi is subject to the same preemption model as scavenge and the same use case restrictions as the regular mpi partition (multi-node, tightly couple parallel codes). You can see the full count of MPI nodes in the Partition tables on the respective cluster pages.","title":"Scavenge MPI Nodes"},{"location":"clusters-at-yale/job-scheduling/scavenge/#research-available-nodes","text":"If you are interested in specific hardware and its availability, you can use the sinfo command to query how many of each type of node is available and what features it lists. For example: sinfo -e -o \"%.6D|%c|%G|%b\" | column -ts \"|\" will show you the kinds of nodes available, and sinfo -e -o \"%.6D|%T|%c|%G|%b\" | column -ts \"|\" will break out how many nodes in each state (e.g. allocated, mixed, idle) there are. For more options see the official sinfo documentation .","title":"Research Available Nodes"},{"location":"clusters-at-yale/job-scheduling/scrontab/","text":"Recurring Jobs You can use scrontab to schedule recurring jobs. It uses a syntax similar to crontab , a standard Unix/Linux utility for running programs at specified intervals. scrontab vs crontab If you are familiar with crontab , there are some important differences to note: The scheduled times for scrontab indicate when your job is eligible to start. They are not start times like a traditional Cron jobs. Jobs managed with scrontab won't start if an earlier iteration of the same job is still running. Cron will happily run multiple copies of a job at the same time. You have one scrontab file for the entire cluster, unlike crontabs which are stored locally on each computer. Set Up Your scrontab Edit Your scrontab Run scrontab -e to edit your scrontab file. If you prefer to use nano to edit files, run EDITOR = nano scrontab -e Lines that start with #SCRON are treated like the beginning of a new batch job, and work like #SBATCH directives for batch jobs. Slurm will ignore #SBATCH directives in scripts you run as scrontab jobs. You can use most common sbatch options just as you would using sbatch on the command line . The first line after your SCRON directives specifies the schedule for your job and the command to run. Note All of your scrontab jobs will start with your home directory as the working directory. You can change this with the --chdir slurm option. Cron syntax Crontab syntax is specified in five columns, to specify minutes, hours, days of the month, months, and days of the week. Especially at first you may find it easiest to use a helper application to generate your cron date fields, such as crontab-generator or cronhub.io . You can also use the short-hand syntax @hourly , @daily , @weekly , @monthly , and @yearly instead of the five separate columns. What to Run If you're running a script it must be marked as executable. Jobs handled by scrontab do not run in a full login shell, so if you have customized your .bashrc file you need to add: source ~/.bashrc To your script to ensure that your environment is set up correctly. Note The command you specify in the scrontab is executed via bash, NOT sbatch. You can list multiple commands separated by ;, and use other shell features, such as redirects. Also, any #SBATCH directives in executed scripts will be ignored. You must use #SCRON in the scrontab file instead. Note Your scrontab jobs will appear to have the same JobID every time they run until the next time you edit your scrontab file (they are being requeued). This means that only the most recent job will be logged to the default output file. If you want deeper history, you should redirect output in your scripts to filenames with something more unique in their names, like a date or timestamp, e.g. python my_script.py > $( date + \"%Y-%m-%d\" ) _myjob_scrontab.out If you want to see slurm accounting of a job handled by scrontab, for example job 12345 run: sacct --duplicates --jobs 12345 # or with short options sacct -Dj 12345 Examples Run a Daily Simulation This example submits a 6-hour simulation eligible to start every day at 12:00 AM. #SCRON --time 6:00:00 #SCRON --cpus-per-task 4 #SCRON --name \"daily_sim\" #SCRON --chdir /home/netid/project #SCRON -o my_simulations/%j-out.txt @daily ./simulation_v2_final.sh Run a Weekly Transfer Job This example submits a transfer script eligible to start every Wednesday at 8:00 PM. #SCRON --time 1:00:00 #SCRON --partition transfer #SCRON --chdir /home/netid/project/to_transfer #SCRON -o transfer_log_%j.txt 0 20 * * 3 ./rclone_commands.sh Capture output from each run in a separate file Normally scrontab will clobber the output file from the previous run on each execution, since each execution uses the same jobid. This can be avoided using a redirect to a date-stamped file. 0 20 * * 3 ./commands.sh > myjob_ $( date +%Y%m%d%H%M ) .out","title":"Recurring Jobs"},{"location":"clusters-at-yale/job-scheduling/scrontab/#recurring-jobs","text":"You can use scrontab to schedule recurring jobs. It uses a syntax similar to crontab , a standard Unix/Linux utility for running programs at specified intervals. scrontab vs crontab If you are familiar with crontab , there are some important differences to note: The scheduled times for scrontab indicate when your job is eligible to start. They are not start times like a traditional Cron jobs. Jobs managed with scrontab won't start if an earlier iteration of the same job is still running. Cron will happily run multiple copies of a job at the same time. You have one scrontab file for the entire cluster, unlike crontabs which are stored locally on each computer.","title":"Recurring Jobs"},{"location":"clusters-at-yale/job-scheduling/scrontab/#set-up-your-scrontab","text":"","title":"Set Up Your scrontab"},{"location":"clusters-at-yale/job-scheduling/scrontab/#edit-your-scrontab","text":"Run scrontab -e to edit your scrontab file. If you prefer to use nano to edit files, run EDITOR = nano scrontab -e Lines that start with #SCRON are treated like the beginning of a new batch job, and work like #SBATCH directives for batch jobs. Slurm will ignore #SBATCH directives in scripts you run as scrontab jobs. You can use most common sbatch options just as you would using sbatch on the command line . The first line after your SCRON directives specifies the schedule for your job and the command to run. Note All of your scrontab jobs will start with your home directory as the working directory. You can change this with the --chdir slurm option.","title":"Edit Your scrontab"},{"location":"clusters-at-yale/job-scheduling/scrontab/#cron-syntax","text":"Crontab syntax is specified in five columns, to specify minutes, hours, days of the month, months, and days of the week. Especially at first you may find it easiest to use a helper application to generate your cron date fields, such as crontab-generator or cronhub.io . You can also use the short-hand syntax @hourly , @daily , @weekly , @monthly , and @yearly instead of the five separate columns.","title":"Cron syntax"},{"location":"clusters-at-yale/job-scheduling/scrontab/#what-to-run","text":"If you're running a script it must be marked as executable. Jobs handled by scrontab do not run in a full login shell, so if you have customized your .bashrc file you need to add: source ~/.bashrc To your script to ensure that your environment is set up correctly. Note The command you specify in the scrontab is executed via bash, NOT sbatch. You can list multiple commands separated by ;, and use other shell features, such as redirects. Also, any #SBATCH directives in executed scripts will be ignored. You must use #SCRON in the scrontab file instead. Note Your scrontab jobs will appear to have the same JobID every time they run until the next time you edit your scrontab file (they are being requeued). This means that only the most recent job will be logged to the default output file. If you want deeper history, you should redirect output in your scripts to filenames with something more unique in their names, like a date or timestamp, e.g. python my_script.py > $( date + \"%Y-%m-%d\" ) _myjob_scrontab.out If you want to see slurm accounting of a job handled by scrontab, for example job 12345 run: sacct --duplicates --jobs 12345 # or with short options sacct -Dj 12345","title":"What to Run"},{"location":"clusters-at-yale/job-scheduling/scrontab/#examples","text":"","title":"Examples"},{"location":"clusters-at-yale/job-scheduling/scrontab/#run-a-daily-simulation","text":"This example submits a 6-hour simulation eligible to start every day at 12:00 AM. #SCRON --time 6:00:00 #SCRON --cpus-per-task 4 #SCRON --name \"daily_sim\" #SCRON --chdir /home/netid/project #SCRON -o my_simulations/%j-out.txt @daily ./simulation_v2_final.sh","title":"Run a Daily Simulation"},{"location":"clusters-at-yale/job-scheduling/scrontab/#run-a-weekly-transfer-job","text":"This example submits a transfer script eligible to start every Wednesday at 8:00 PM. #SCRON --time 1:00:00 #SCRON --partition transfer #SCRON --chdir /home/netid/project/to_transfer #SCRON -o transfer_log_%j.txt 0 20 * * 3 ./rclone_commands.sh","title":"Run a Weekly Transfer Job"},{"location":"clusters-at-yale/job-scheduling/scrontab/#capture-output-from-each-run-in-a-separate-file","text":"Normally scrontab will clobber the output file from the previous run on each execution, since each execution uses the same jobid. This can be avoided using a redirect to a date-stamped file. 0 20 * * 3 ./commands.sh > myjob_ $( date +%Y%m%d%H%M ) .out","title":"Capture output from each run in a separate file"},{"location":"clusters-at-yale/job-scheduling/simplequeue/","text":"SimpleQueue SimpleQueue is a tool written here to streamline submission of a large number of jobs using a task file. It has a number of advantages: You can run more of your sequential jobs concurrently, since there is a limit on the number of individual qsubs you can run simultaneously. You only have one job to keep track of. If you need to shut everything down, you only need to kill one job. SimpleQueue keeps track of the status of individual jobs. Note that version 3.0+ of SimpleQueue differs from earlier versions in important ways, in particular the meaning of -n. If you have been using an earlier version, please read the following carefully! SimpleQueue is available as a module on our clusters. Run: module avail simplequeue to locate the simplequeue module on your cluster of choice. Example SimpleQueue Job For example, imagine that you have 1000 fastq files that correspond to individual samples you want to map to a genome with bowtie2 and convert to bam files with samtools . Given some initial testing, you think that 80 cpus working together will be enough to finish the job in a reasonable time. Step 1: Create Task List The first step is to create a file with a list of the \"tasks\" you want to run. Each task corresponds to what you might otherwise have run as a single job. A task can be a simple command invocation, or a sequence of commands. You can call the task file anything, but for this example assume it's called \"tasklist.txt\" and contains: module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample1 --rg SM:sample1 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample1.fastq - | samtools view -Shu - | samtools sort - sample1 module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample2 --rg SM:sample2 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample2.fastq - | samtools view -Shu - | samtools sort - sample2 ... module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample1000 --rg SM:sample1000 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample1000.fastq - | samtools view -Shu - | samtools sort - sample1000 For simplicity, we'll assume that tasklist, input fastq files, and indexed genome are in a directory called ~/genome_proj/mapping . Step 2: Create Submission Script Load the SimpleQueue module, then create the launch script using: sqCreateScript -q general -N genome_map -n 80 tasklist.txt > run.sh These parameters specify that the job, named genome_map, will be submitted to the general queue/partition. This job will find 80 free cores, start 80 workers on them, and begin processing tasks from the taskfile tasklist.txt . sqCreateScript takes a number of options. They differ somewhat from cluster to cluster, particularly the default values for queue, walltime, and memory. You can run sqCreateScript without any arguments to see the exact options on your cluster. Usage: -h, --help show this help message and exit -n WORKERS, --workers=WORKERS Number of workers to use. Not required. Defaults to 1. -c CORES, --cores=CORES Number of cores to request per worker. Defaults to 1. -m MEM, --mem=MEM Memory per worker. Not required. Defaults to 1G -w WALLTIME, --walltime=WALLTIME Walltime to request for the Slurm Job in form [[D-]HH:]MM:SS. Not required. Defaults to 1:00:00. -q QUEUE, --queue=QUEUE Name of queue to use. Not required. Defaults to general -N NAME, --name=NAME Base job name to use. Not required. Defaults to SimpleQueue. --logdir=LOGDIR Name of logging directory. Defaults to SQ_Files_${SLURM_JOB_ID}. Step 3: Submit Your Job Now you can simply submit run.sh to the scheduler. All of the important scheduler options (queue, number of tasks, number of cpus per task) will have been set in the script so you needn't worry about them. Shortly after run.sh begins running, you should see a directory appear called SQ_Files_jobid where jobid is the jobid the scheduler assigned your job. This directory contains logs from all the tasks that are run during your job. In addition, there are a few other files that record information about the job as a whole. Of these, the most important one is SQ.log . It should be reviewed if you encounter a problem with a run. Assuming that all goes well, tasks from the tasklist file will be scheduled automatically onto the cpus you acquired until all the tasks have completed. At that time, the job will terminate, and you'll see several summary files: scheduler_jobid_out.txt : this is the stdout from simple queue proper (it is generally empty). scheduler_jobid_err.txt : this is the stderr from simple queue proper (it is generally a copy of SQ.log ). tasklist.txt.STATUS : this contains a list of all the tasks that were run, including exit status, start time, end time, pid, node run on, and the command run. tasklist.txt.REMAINING : Failed or uncompleted tasks will be listed in this file in the same format as tasklist, so that those tasks can be easily rerun. You should review the status files related to these tasks to understand why they did not complete. This list is provided for convenience. It is always a good idea to scan tasklist.STATUS to double check which tasks did in fact complete with a normal exit status. tasklist.txt.ROGUES : The simple queue system attempts to ensure that all tasks launched eventually exit (normally or abnormally). If it fails to get confirmation that a task has exited, information about the command will be written to this file. This information can be used to hunt down and kill run away processes. Other Important Options If your individual tasks need more than the default memory allocated on your cluster, you can specify a different value using -m. For example: sqCreateScript -m 10g -n 4 ... tasklist > run.sh would request 10GiB of RAM for each of your workers. If your jobs are themselves multithreaded, you can request that your workers have multiple cores using the -c option: sqCreateScript -c 20 -n 4 ... tasklist > run.sh This would create 4 workers, each having access to 20 cores.","title":"SimpleQueue"},{"location":"clusters-at-yale/job-scheduling/simplequeue/#simplequeue","text":"SimpleQueue is a tool written here to streamline submission of a large number of jobs using a task file. It has a number of advantages: You can run more of your sequential jobs concurrently, since there is a limit on the number of individual qsubs you can run simultaneously. You only have one job to keep track of. If you need to shut everything down, you only need to kill one job. SimpleQueue keeps track of the status of individual jobs. Note that version 3.0+ of SimpleQueue differs from earlier versions in important ways, in particular the meaning of -n. If you have been using an earlier version, please read the following carefully! SimpleQueue is available as a module on our clusters. Run: module avail simplequeue to locate the simplequeue module on your cluster of choice.","title":"SimpleQueue"},{"location":"clusters-at-yale/job-scheduling/simplequeue/#example-simplequeue-job","text":"For example, imagine that you have 1000 fastq files that correspond to individual samples you want to map to a genome with bowtie2 and convert to bam files with samtools . Given some initial testing, you think that 80 cpus working together will be enough to finish the job in a reasonable time.","title":"Example SimpleQueue Job"},{"location":"clusters-at-yale/job-scheduling/simplequeue/#step-1-create-task-list","text":"The first step is to create a file with a list of the \"tasks\" you want to run. Each task corresponds to what you might otherwise have run as a single job. A task can be a simple command invocation, or a sequence of commands. You can call the task file anything, but for this example assume it's called \"tasklist.txt\" and contains: module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample1 --rg SM:sample1 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample1.fastq - | samtools view -Shu - | samtools sort - sample1 module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample2 --rg SM:sample2 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample2.fastq - | samtools view -Shu - | samtools sort - sample2 ... module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample1000 --rg SM:sample1000 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample1000.fastq - | samtools view -Shu - | samtools sort - sample1000 For simplicity, we'll assume that tasklist, input fastq files, and indexed genome are in a directory called ~/genome_proj/mapping .","title":"Step 1: Create Task List"},{"location":"clusters-at-yale/job-scheduling/simplequeue/#step-2-create-submission-script","text":"Load the SimpleQueue module, then create the launch script using: sqCreateScript -q general -N genome_map -n 80 tasklist.txt > run.sh These parameters specify that the job, named genome_map, will be submitted to the general queue/partition. This job will find 80 free cores, start 80 workers on them, and begin processing tasks from the taskfile tasklist.txt . sqCreateScript takes a number of options. They differ somewhat from cluster to cluster, particularly the default values for queue, walltime, and memory. You can run sqCreateScript without any arguments to see the exact options on your cluster. Usage: -h, --help show this help message and exit -n WORKERS, --workers=WORKERS Number of workers to use. Not required. Defaults to 1. -c CORES, --cores=CORES Number of cores to request per worker. Defaults to 1. -m MEM, --mem=MEM Memory per worker. Not required. Defaults to 1G -w WALLTIME, --walltime=WALLTIME Walltime to request for the Slurm Job in form [[D-]HH:]MM:SS. Not required. Defaults to 1:00:00. -q QUEUE, --queue=QUEUE Name of queue to use. Not required. Defaults to general -N NAME, --name=NAME Base job name to use. Not required. Defaults to SimpleQueue. --logdir=LOGDIR Name of logging directory. Defaults to SQ_Files_${SLURM_JOB_ID}.","title":"Step 2: Create Submission Script"},{"location":"clusters-at-yale/job-scheduling/simplequeue/#step-3-submit-your-job","text":"Now you can simply submit run.sh to the scheduler. All of the important scheduler options (queue, number of tasks, number of cpus per task) will have been set in the script so you needn't worry about them. Shortly after run.sh begins running, you should see a directory appear called SQ_Files_jobid where jobid is the jobid the scheduler assigned your job. This directory contains logs from all the tasks that are run during your job. In addition, there are a few other files that record information about the job as a whole. Of these, the most important one is SQ.log . It should be reviewed if you encounter a problem with a run. Assuming that all goes well, tasks from the tasklist file will be scheduled automatically onto the cpus you acquired until all the tasks have completed. At that time, the job will terminate, and you'll see several summary files: scheduler_jobid_out.txt : this is the stdout from simple queue proper (it is generally empty). scheduler_jobid_err.txt : this is the stderr from simple queue proper (it is generally a copy of SQ.log ). tasklist.txt.STATUS : this contains a list of all the tasks that were run, including exit status, start time, end time, pid, node run on, and the command run. tasklist.txt.REMAINING : Failed or uncompleted tasks will be listed in this file in the same format as tasklist, so that those tasks can be easily rerun. You should review the status files related to these tasks to understand why they did not complete. This list is provided for convenience. It is always a good idea to scan tasklist.STATUS to double check which tasks did in fact complete with a normal exit status. tasklist.txt.ROGUES : The simple queue system attempts to ensure that all tasks launched eventually exit (normally or abnormally). If it fails to get confirmation that a task has exited, information about the command will be written to this file. This information can be used to hunt down and kill run away processes.","title":"Step 3: Submit Your Job"},{"location":"clusters-at-yale/job-scheduling/simplequeue/#other-important-options","text":"If your individual tasks need more than the default memory allocated on your cluster, you can specify a different value using -m. For example: sqCreateScript -m 10g -n 4 ... tasklist > run.sh would request 10GiB of RAM for each of your workers. If your jobs are themselves multithreaded, you can request that your workers have multiple cores using the -c option: sqCreateScript -c 20 -n 4 ... tasklist > run.sh This would create 4 workers, each having access to 20 cores.","title":"Other Important Options"},{"location":"clusters-at-yale/job-scheduling/slurm-account/","text":"Slurm Account Coordinator On the clusters the YCRC maintains, we map your linux user and group to your Slurm user and account, which is what actually gives you permission to submit to the various partitions available on the clusters. By changing the Slurm accounts associated with your user, you can modify access to partitions. As a coordinator of an account, you have permission to modify users' association with that account and modify jobs running that are associated with that account. Below are some useful example commands where we use an example user with the name \"be59\" where you are the coordinator of the slurm account \"cryoem\". Add/Remove Users From an Account sacctmgr add user be59 account = cryoem # add user sacctmgr remove user where user = be59 and account = cryoem # remove user Show Account Info sacctmgr show assoc user = be59 # show user associations sacctmgr show assoc account = cryoem # show assocations for account Submit Jobs salloc -A cryoem ... sbatch -A cryoem my_script.sh List Jobs squeue -A cryoem # by account squeue -u be59 # by user Cancel Jobs scancel 1234 # by job ID scancel -u be59 # kill all jobs by user scancel -u be59 --state = running # kill running jobs by user scancel -u be59 --state = pending # kill pending jobs by user scancel -A cryoem # kill all jobs in the account Hold and Release Jobs scontrol hold 1234 # by job ID scontrol release 1234 # remove the hold scontrol uhold 1234 # hold job 1234 but allow the job's owner to release it","title":"Slurm Account Coordinator"},{"location":"clusters-at-yale/job-scheduling/slurm-account/#slurm-account-coordinator","text":"On the clusters the YCRC maintains, we map your linux user and group to your Slurm user and account, which is what actually gives you permission to submit to the various partitions available on the clusters. By changing the Slurm accounts associated with your user, you can modify access to partitions. As a coordinator of an account, you have permission to modify users' association with that account and modify jobs running that are associated with that account. Below are some useful example commands where we use an example user with the name \"be59\" where you are the coordinator of the slurm account \"cryoem\".","title":"Slurm Account Coordinator"},{"location":"clusters-at-yale/job-scheduling/slurm-account/#addremove-users-from-an-account","text":"sacctmgr add user be59 account = cryoem # add user sacctmgr remove user where user = be59 and account = cryoem # remove user","title":"Add/Remove Users From an Account"},{"location":"clusters-at-yale/job-scheduling/slurm-account/#show-account-info","text":"sacctmgr show assoc user = be59 # show user associations sacctmgr show assoc account = cryoem # show assocations for account","title":"Show Account Info"},{"location":"clusters-at-yale/job-scheduling/slurm-account/#submit-jobs","text":"salloc -A cryoem ... sbatch -A cryoem my_script.sh","title":"Submit Jobs"},{"location":"clusters-at-yale/job-scheduling/slurm-account/#list-jobs","text":"squeue -A cryoem # by account squeue -u be59 # by user","title":"List Jobs"},{"location":"clusters-at-yale/job-scheduling/slurm-account/#cancel-jobs","text":"scancel 1234 # by job ID scancel -u be59 # kill all jobs by user scancel -u be59 --state = running # kill running jobs by user scancel -u be59 --state = pending # kill pending jobs by user scancel -A cryoem # kill all jobs in the account","title":"Cancel Jobs"},{"location":"clusters-at-yale/job-scheduling/slurm-account/#hold-and-release-jobs","text":"scontrol hold 1234 # by job ID scontrol release 1234 # remove the hold scontrol uhold 1234 # hold job 1234 but allow the job's owner to release it","title":"Hold and Release Jobs"},{"location":"clusters-at-yale/job-scheduling/slurm-examples/","text":"Submission Script Examples In addition to those below, we have additional example submission scripts for Parallel R, Matlab and Python . Single threaded programs (basic) #!/bin/bash #SBATCH --job-name=my_job #SBATCH --time=10:00 ./hello.omp Multi-threaded programs #!/bin/bash #SBATCH --job-name=omp_job #SBATCH --output=omp_job.txt #SBATCH --ntasks=1 #SBATCH --cpus-per-task=4 #SBATCH --time=10:00 export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK ./hello.omp Multi-process programs #!/bin/bash #SBATCH --job-name=mpi #SBATCH --output=mpi_job.txt #SBATCH --ntasks=4 #SBATCH --time=10:00 mpirun hello.mpi Tip On Grace's mpi partition, try to make ntasks equal to a multiple of 24. Hybrid (MPI+OpenMP) programs #!/bin/bash #SBATCH --job-name=hybrid #SBATCH --output=hydrid_job.txt #SBATCH --ntasks=8 #SBATCH --cpus-per-task=5 #SBATCH --nodes=2 #SBATCH --time=10:00 export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK mpirun hello_hybrid.mpi GPU job #!/bin/bash #SBATCH --job-name=deep_learn #SBATCH --output=gpu_job.txt #SBATCH --ntasks=1 #SBATCH --cpus-per-task=2 #SBATCH --gpus=p100:2 #SBATCH --partition=gpu #SBATCH --time=10:00 module load CUDA module load cuDNN # using your anaconda environment source activate deep-learn python my_tensorflow.py","title":"Submission Script Examples"},{"location":"clusters-at-yale/job-scheduling/slurm-examples/#submission-script-examples","text":"In addition to those below, we have additional example submission scripts for Parallel R, Matlab and Python .","title":"Submission Script Examples"},{"location":"clusters-at-yale/job-scheduling/slurm-examples/#single-threaded-programs-basic","text":"#!/bin/bash #SBATCH --job-name=my_job #SBATCH --time=10:00 ./hello.omp","title":"Single threaded programs (basic)"},{"location":"clusters-at-yale/job-scheduling/slurm-examples/#multi-threaded-programs","text":"#!/bin/bash #SBATCH --job-name=omp_job #SBATCH --output=omp_job.txt #SBATCH --ntasks=1 #SBATCH --cpus-per-task=4 #SBATCH --time=10:00 export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK ./hello.omp","title":"Multi-threaded programs"},{"location":"clusters-at-yale/job-scheduling/slurm-examples/#multi-process-programs","text":"#!/bin/bash #SBATCH --job-name=mpi #SBATCH --output=mpi_job.txt #SBATCH --ntasks=4 #SBATCH --time=10:00 mpirun hello.mpi Tip On Grace's mpi partition, try to make ntasks equal to a multiple of 24.","title":"Multi-process programs"},{"location":"clusters-at-yale/job-scheduling/slurm-examples/#hybrid-mpiopenmp-programs","text":"#!/bin/bash #SBATCH --job-name=hybrid #SBATCH --output=hydrid_job.txt #SBATCH --ntasks=8 #SBATCH --cpus-per-task=5 #SBATCH --nodes=2 #SBATCH --time=10:00 export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK mpirun hello_hybrid.mpi","title":"Hybrid (MPI+OpenMP) programs"},{"location":"clusters-at-yale/job-scheduling/slurm-examples/#gpu-job","text":"#!/bin/bash #SBATCH --job-name=deep_learn #SBATCH --output=gpu_job.txt #SBATCH --ntasks=1 #SBATCH --cpus-per-task=2 #SBATCH --gpus=p100:2 #SBATCH --partition=gpu #SBATCH --time=10:00 module load CUDA module load cuDNN # using your anaconda environment source activate deep-learn python my_tensorflow.py","title":"GPU job"},{"location":"data/","text":"Data Storage Below we highlight some data storage option at Yale that are appropriate for research data. For a more complete list of data storage options, see the Storage Finder . If you have questions about selecting an appropriate home for your data, contact us for assistance. HPC Cluster Storage Capacity: Varies. Cost: Varies Sensitive data is only allowed on the Milgram cluster Only available on YCRC HPC clusters Along with access to the compute clusters we provide each research group with cluster storage space for research data. The storage is separated into three quotas: Home, Project, and 60-day Scratch. Each of these quotas limit both the amount in bytes and number of files you can store. Details can be found on our Cluster Storage page. Additional project-style storage allocations can be purchased. See here for more information. Google Drive via EliApps Warning Changes to Google Drive pricing ITS has informed us of a number of changes to the EliApps Google Drive quotas, including shared drives. As of 8/15/23, all new EliApps accounts will have a free quota of 5GB. As of 7/1/24, all existing EliApps accounts will have a free quota of 5GB. Quotas beyond 5GB will be available for $145/TB/yr Therefore, you should probably not consider Google Drive on EliApps for storage large amounts of data. ITS suggested alternatives are Storage@Yale, Teams/SharePoint, or DropBox. Capacity: 400,000 file count quota, 5TiB max file size. Cost: Free No sensitive data (e.g. ePHI, HIPAA) Can be mounted on your local machine and transferred to via Globus Google Drive Connector Google Drive is a cloud service for file storage, document editing and sharing. All members of the Yale community with an EliApps (Google Workspace for Education) account have storage at no cost in the associated Google Drive account. Moreover, EliApps users can request Shared Drives, which are shared spaces where all files are group-owned. For more information on Google Drive through EliApps, see our Google Drive documentation . Storage @ Yale Capacity: As requested. Cost: See below No sensitive data (e.g. ePHI, HIPAA) for cluster mounts Can be mounted on the cluster or computers on campus (but not both) Storage @ Yale (S@Y) is a central storage service provided by ITS. S@Y shares can either be accessible on campus computers or the clusters, but not both. Type Use Object Tier Good for staging data between cloud and clusters Active Tier Daily use, still copy to cluster before using in jobs Archive Tier Long term storage, low access. Make sure to properly archive Backup Tier Low-access remote object backup. Make sure to properly archive For pricing information, see the ITS Data Rates . All prices are charged monthly for storage used at that time. To request a share, press the \u201cRequest this Service\u201d button in the right sidebar on the Storage@Yale website . If you would like to request a share that is mounted on the clusters, specify in your request that the share be mounted from the HPC clusters . If you elect to use archive tier storage, be cognizant of its performance characteristics . Cluster I/O Performance Since cluster-mounted S@Y shares do not provide sufficient performance for use in jobs, they are not mounted on our compute or login nodes. To access S@Y on the clusters, connect to one of the transfer nodes to stage the data to Project or Scratch60 before running jobs. Microsoft Teams/SharePoint Capacity: 25 TB, 250 GB per file. Cost: Free You can request a Team and 25TiB of underlying SharePoint storage space from ITS Email And Collaboration Services . For more information on The relationship between Teams, SharePoint, and OneDrive, see the official Microsoft post on the subject . Dropbox at Yale ITS offers departmental subscriptions to DropBox for a low cost (currently $23.66/user/year). Unlimited storage (take this with a grain of salt) Low risk data only For more information about DropBox at Yale, see the ITS website. Box at Yale Capacity: 50GiB per user. Cost: Free. 15 GiB max file size. Sensitive data (e.g. ePHI, HIPAA) only in Secure Box Can be mounted on your local machine and transferred with rclone All members of the Yale community have access to a share at Box at Yale. Box is another cloud-based file sharing and storage service. You can upload and access your data using the web portal and sync data with your local machines via Box Sync. To access, navigate to yale.box.com and login with your yale.edu account. For sync with your local machine, install Box Sync and authenticate with your yale.edu account. For more information about Box at Yale, see the ITS website. To learn more about these options, see the Yale Collaboration Counts page available through Yale ITS for details.","title":"Data Storage"},{"location":"data/#data-storage","text":"Below we highlight some data storage option at Yale that are appropriate for research data. For a more complete list of data storage options, see the Storage Finder . If you have questions about selecting an appropriate home for your data, contact us for assistance.","title":"Data Storage"},{"location":"data/#hpc-cluster-storage","text":"Capacity: Varies. Cost: Varies Sensitive data is only allowed on the Milgram cluster Only available on YCRC HPC clusters Along with access to the compute clusters we provide each research group with cluster storage space for research data. The storage is separated into three quotas: Home, Project, and 60-day Scratch. Each of these quotas limit both the amount in bytes and number of files you can store. Details can be found on our Cluster Storage page. Additional project-style storage allocations can be purchased. See here for more information.","title":"HPC Cluster Storage"},{"location":"data/#google-drive-via-eliapps","text":"Warning Changes to Google Drive pricing ITS has informed us of a number of changes to the EliApps Google Drive quotas, including shared drives. As of 8/15/23, all new EliApps accounts will have a free quota of 5GB. As of 7/1/24, all existing EliApps accounts will have a free quota of 5GB. Quotas beyond 5GB will be available for $145/TB/yr Therefore, you should probably not consider Google Drive on EliApps for storage large amounts of data. ITS suggested alternatives are Storage@Yale, Teams/SharePoint, or DropBox. Capacity: 400,000 file count quota, 5TiB max file size. Cost: Free No sensitive data (e.g. ePHI, HIPAA) Can be mounted on your local machine and transferred to via Globus Google Drive Connector Google Drive is a cloud service for file storage, document editing and sharing. All members of the Yale community with an EliApps (Google Workspace for Education) account have storage at no cost in the associated Google Drive account. Moreover, EliApps users can request Shared Drives, which are shared spaces where all files are group-owned. For more information on Google Drive through EliApps, see our Google Drive documentation .","title":"Google Drive via EliApps"},{"location":"data/#storage-yale","text":"Capacity: As requested. Cost: See below No sensitive data (e.g. ePHI, HIPAA) for cluster mounts Can be mounted on the cluster or computers on campus (but not both) Storage @ Yale (S@Y) is a central storage service provided by ITS. S@Y shares can either be accessible on campus computers or the clusters, but not both. Type Use Object Tier Good for staging data between cloud and clusters Active Tier Daily use, still copy to cluster before using in jobs Archive Tier Long term storage, low access. Make sure to properly archive Backup Tier Low-access remote object backup. Make sure to properly archive For pricing information, see the ITS Data Rates . All prices are charged monthly for storage used at that time. To request a share, press the \u201cRequest this Service\u201d button in the right sidebar on the Storage@Yale website . If you would like to request a share that is mounted on the clusters, specify in your request that the share be mounted from the HPC clusters . If you elect to use archive tier storage, be cognizant of its performance characteristics . Cluster I/O Performance Since cluster-mounted S@Y shares do not provide sufficient performance for use in jobs, they are not mounted on our compute or login nodes. To access S@Y on the clusters, connect to one of the transfer nodes to stage the data to Project or Scratch60 before running jobs.","title":"Storage @ Yale"},{"location":"data/#microsoft-teamssharepoint","text":"Capacity: 25 TB, 250 GB per file. Cost: Free You can request a Team and 25TiB of underlying SharePoint storage space from ITS Email And Collaboration Services . For more information on The relationship between Teams, SharePoint, and OneDrive, see the official Microsoft post on the subject .","title":"Microsoft Teams/SharePoint"},{"location":"data/#dropbox-at-yale","text":"ITS offers departmental subscriptions to DropBox for a low cost (currently $23.66/user/year). Unlimited storage (take this with a grain of salt) Low risk data only For more information about DropBox at Yale, see the ITS website.","title":"Dropbox at Yale"},{"location":"data/#box-at-yale","text":"Capacity: 50GiB per user. Cost: Free. 15 GiB max file size. Sensitive data (e.g. ePHI, HIPAA) only in Secure Box Can be mounted on your local machine and transferred with rclone All members of the Yale community have access to a share at Box at Yale. Box is another cloud-based file sharing and storage service. You can upload and access your data using the web portal and sync data with your local machines via Box Sync. To access, navigate to yale.box.com and login with your yale.edu account. For sync with your local machine, install Box Sync and authenticate with your yale.edu account. For more information about Box at Yale, see the ITS website. To learn more about these options, see the Yale Collaboration Counts page available through Yale ITS for details.","title":"Box at Yale"},{"location":"data/archive/","text":"Archive Your Data Clean Out Unnecessary Files Not every file created during a project needs to be archived. If you proactively reduce the number of extraneous files in your archive, you will both reduce storage costs and increase the usefulness of that data upon retrieval. Common files that can be deleted when archiving data include: Compiled codes, such as .o or .pyc files. These files will likely not even work on the next system you may restore these data to and they can contribute significantly to your file count limit. Just keep the source code and clean installation instructions. Some log files. Many log created by the system are not necessary to store indefinitely. Any Slurm logs from failed runs (prior to a successful run) or outputs from Matlab (e.g. hs_error_pid*.log , java.log.* ) can often safely be ignored. Crash files such are core dumps (e.g. core.* , matlab_crash_dump. ). Compress Your Data Most archive locations (S@Y Archive Tier, Google Drive) perform much better with a smaller number of larger files. In fact, Google Shared Drives have a file count limit of 400,000 files. Therefore, it is highly recommended that your compress, using zip or tar , portions of your data for ease of storage and retrieval. For example, to create a compressed archive of a directory you can do the following: tar -cvzf archive-2021-04-26.tar.gz ./data_for_archival This will create a new file ( archive-2021-04-26.tar.gz ) which contains all the data from within data_for_archival and is compressed to minimize storage requirements. This file can then be transferred to any off-site backup or archive location. List and Extract Data From Existing Archive You can list the contents of an archive file like this: tar -ztvf archive-2021-04-26.tar.gz which will print the full list of every file within the archive. The clusters also have the lz tool installed that provides a shorter way to list the contents: lz archive-2021-04-26.tar.gz You can then extract a single file from a large tar-file without decompressing the full thing: tar -zxvf archive-2021-04-26.tar.gz path/to/file.txt There is an alternative syntax that is more legible: tar --extract --file = archive-2021-04-26.tar.gz file.txt Either should work fine on the clusters. Tips for S@Y Archive Tier The archive tier of Storage@Yale is a cloud-based system. It provides an archive location for long-term data, featuring professional systems management, security, and protection from data loss via redundant, enterprise-grade hardware. Data is dual-written to two locations. The cost per TB is subtantially lower than for the active-access S@Y tier. For current pricing, see ITS Data Rates . To use S@Y (Archive) effectively, you need to be aware of how it works and follow some best practices. Note Just as for the S@Y Active Tier , direct access from the cluster should be specified when requesting the share. Direct access from the cluster is only authorized for Low and Moderate risk data. When you write to the archive, you are actually copying to a large hard disk-based cache, so writes are normally fast. Your copy will appear to complete as soon as the file is in the disk cache. It is NOT yet in the cloud. In the background, the system will flush files to the cloud and delete them from the cache. If you read a file very soon after you write it, it is probably still in the cache, and your read will be quick. However, once some time has elapsed and the file has been moved to the cloud, read speed will be somewhat slower. Note S@Y Archive has a single-filesize limit of 5 TB, so plan your data compressions accordingly. Some key takeaways: Operations that only read the metadata of files will be fast (ls, find, etc) even if the file is in the cloud, since metadata is kept in the disk cache. Operations that actually read the file (cp, wc -l, tar, etc) will require recovering the entire file to disk cache first, and can take several minutes or longer depending on how busy the system is. If many files will need to be recovered together, it is much better to store them as a single file first with tar or zip, then write that file to the archive. Please do NOT write huge numbers of small files. They will be difficult or impossible to restore in large numbers. Please do NOT do repetitive operations like rsyncs to the archive, since they overload the system. S@Y Backup Tier Yale ITS offers dedicated offsite \"S3\"-style object storage for data backup and archive to the cloud. Clients are responsible for the data transfers and recovery via the S3 protocol, such as by using RClone . The Backup Tier is authorized for Low, Moderate, and High Risk data. As with the Archive Tier, the Backup Tier is low-speed and not meant for daily use. For current pricing, see ITS Data Rates .","title":"Archive Your Data"},{"location":"data/archive/#archive-your-data","text":"","title":"Archive Your Data"},{"location":"data/archive/#clean-out-unnecessary-files","text":"Not every file created during a project needs to be archived. If you proactively reduce the number of extraneous files in your archive, you will both reduce storage costs and increase the usefulness of that data upon retrieval. Common files that can be deleted when archiving data include: Compiled codes, such as .o or .pyc files. These files will likely not even work on the next system you may restore these data to and they can contribute significantly to your file count limit. Just keep the source code and clean installation instructions. Some log files. Many log created by the system are not necessary to store indefinitely. Any Slurm logs from failed runs (prior to a successful run) or outputs from Matlab (e.g. hs_error_pid*.log , java.log.* ) can often safely be ignored. Crash files such are core dumps (e.g. core.* , matlab_crash_dump. ).","title":"Clean Out Unnecessary Files"},{"location":"data/archive/#compress-your-data","text":"Most archive locations (S@Y Archive Tier, Google Drive) perform much better with a smaller number of larger files. In fact, Google Shared Drives have a file count limit of 400,000 files. Therefore, it is highly recommended that your compress, using zip or tar , portions of your data for ease of storage and retrieval. For example, to create a compressed archive of a directory you can do the following: tar -cvzf archive-2021-04-26.tar.gz ./data_for_archival This will create a new file ( archive-2021-04-26.tar.gz ) which contains all the data from within data_for_archival and is compressed to minimize storage requirements. This file can then be transferred to any off-site backup or archive location.","title":"Compress Your Data"},{"location":"data/archive/#list-and-extract-data-from-existing-archive","text":"You can list the contents of an archive file like this: tar -ztvf archive-2021-04-26.tar.gz which will print the full list of every file within the archive. The clusters also have the lz tool installed that provides a shorter way to list the contents: lz archive-2021-04-26.tar.gz You can then extract a single file from a large tar-file without decompressing the full thing: tar -zxvf archive-2021-04-26.tar.gz path/to/file.txt There is an alternative syntax that is more legible: tar --extract --file = archive-2021-04-26.tar.gz file.txt Either should work fine on the clusters.","title":"List and Extract Data From Existing Archive"},{"location":"data/archive/#tips-for-sy-archive-tier","text":"The archive tier of Storage@Yale is a cloud-based system. It provides an archive location for long-term data, featuring professional systems management, security, and protection from data loss via redundant, enterprise-grade hardware. Data is dual-written to two locations. The cost per TB is subtantially lower than for the active-access S@Y tier. For current pricing, see ITS Data Rates . To use S@Y (Archive) effectively, you need to be aware of how it works and follow some best practices. Note Just as for the S@Y Active Tier , direct access from the cluster should be specified when requesting the share. Direct access from the cluster is only authorized for Low and Moderate risk data. When you write to the archive, you are actually copying to a large hard disk-based cache, so writes are normally fast. Your copy will appear to complete as soon as the file is in the disk cache. It is NOT yet in the cloud. In the background, the system will flush files to the cloud and delete them from the cache. If you read a file very soon after you write it, it is probably still in the cache, and your read will be quick. However, once some time has elapsed and the file has been moved to the cloud, read speed will be somewhat slower. Note S@Y Archive has a single-filesize limit of 5 TB, so plan your data compressions accordingly. Some key takeaways: Operations that only read the metadata of files will be fast (ls, find, etc) even if the file is in the cloud, since metadata is kept in the disk cache. Operations that actually read the file (cp, wc -l, tar, etc) will require recovering the entire file to disk cache first, and can take several minutes or longer depending on how busy the system is. If many files will need to be recovered together, it is much better to store them as a single file first with tar or zip, then write that file to the archive. Please do NOT write huge numbers of small files. They will be difficult or impossible to restore in large numbers. Please do NOT do repetitive operations like rsyncs to the archive, since they overload the system.","title":"Tips for S@Y Archive Tier"},{"location":"data/archive/#sy-backup-tier","text":"Yale ITS offers dedicated offsite \"S3\"-style object storage for data backup and archive to the cloud. Clients are responsible for the data transfers and recovery via the S3 protocol, such as by using RClone . The Backup Tier is authorized for Low, Moderate, and High Risk data. As with the Archive Tier, the Backup Tier is low-speed and not meant for daily use. For current pricing, see ITS Data Rates .","title":"S@Y Backup Tier"},{"location":"data/archived-sequencing/","text":"YCGA Sequence Data Archive Retrieve Data from the Archive In the sequencing archive on McCleary , a directory exists for each run, holding one or more tar files. There is a main tar file, plus a tar file for each project directory. Most users only need the project tar file corresponding to their data. Although the archive actually exists on tape or in cloud storage, you can treat it as a regular directory tree. Many operations such as ls , cd , etc. are very fast, since directory structures and file metadata are on a disk cache. However, when you actually read the contents of files the file is retrieved and read into a disk cache. This can take some time. Archived runs are stored in the following locations. Original location Archive location /panfs/sequencers /SAY/archive/YCGA-729009-YCGA-A2/archive/panfs/sequencers /ycga-ba/ba_sequencers /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-ba/ba_sequencers /gpfs/ycga/sequencers/illumina/sequencers /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencers You can directly copy or untar the project tarfile into a scratch directory. Info Very large tar files over 500GB, sometimes fail to download. If you run into problems, contact us at hpc@yale.edu and we can manually download it. cd ~/scratch60/somedir tar \u2013xvf /SAY/archive/YCGA-729009-YCGA-A2/archive/path/to/file.tar Inside the project tar files are the fastq files, which have been compressed using quip . If your pipeline cannot read quip files directly, you will need to uncompress them before using them. module load Quip quip \u2013d M20_ACAGTG_L008_R1_009.fastq.qp For your convenience, we have a tool, restore , that will download a tar file, untar it, and uncompress all quip files. module load ycga-public restore \u2013t /SAY/archive/YCGA-729009-YCGA/archive/path/to/file.tar If you have trouble locating your files, you can use the utility locateRun , using any substring of the original run name. locateRun is in the same module as restore. locateRun C9374AN Restore spends most of the time running quip. You can parallelize and thereby speed up that process using the -n flag. restore \u2013n 20 ... Tip When retrieving data, run untar/unquip as a job on a compute node, not a login node and make sure to allocate sufficient resources to your job, e.g. \u2013c 20 --mem=100G . Tip The ycgaFastq tool can also be used to recover archived data. See here . Example: Imagine that user rdb9 wants to restore data from run BHJWZZBCX3 step 1 Initialize compute node with 20 cores salloc -c 20 module load ycga-public step 2 Find the run location $ locateRun BHJWZZBCX3 /ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3.deleted /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3 Note that the original run location has been deleted, but the archive location is listed. step 3 List the contents of the archived run, and locate the desired project tarball: $ ls -1 /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3 210305_D00306_1337_BHJWZZBCX3_0.tar 210305_D00306_1337_BHJWZZBCX3_0_Unaligned_Project_Jdm222.tar 210305_D00306_1337_BHJWZZBCX3_1_Unaligned-1_Project_Rdb9.tar 210305_D00306_1337_BHJWZZBCX3_2021_05_09_04:00:36_archive.log We want 210305_D00306_1337_BHJWZZBCX3_1_Unaligned-1_Project_Rdb9.tar, matching our netid. step 4 Use the restore utility to copy and uncompress the fastq files from the tar file. By default, restore will start 20 threads, which matches our srun above. The restore will likely take several minutes. To see progress, you can use the -v flag. restore -v -t /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3/210305_D00306_1337_BHJWKHBCX3_1_Unaligned-1_Project_Rdb9.tar The restored fastq files will written to a directory like this: 210305_D00306_1337_BHJWZZBCX3/Data/Intensities/BaseCalls/Unaligned*/Project_*","title":"YCGA Sequence Data Archive"},{"location":"data/archived-sequencing/#ycga-sequence-data-archive","text":"","title":"YCGA Sequence Data Archive"},{"location":"data/archived-sequencing/#retrieve-data-from-the-archive","text":"In the sequencing archive on McCleary , a directory exists for each run, holding one or more tar files. There is a main tar file, plus a tar file for each project directory. Most users only need the project tar file corresponding to their data. Although the archive actually exists on tape or in cloud storage, you can treat it as a regular directory tree. Many operations such as ls , cd , etc. are very fast, since directory structures and file metadata are on a disk cache. However, when you actually read the contents of files the file is retrieved and read into a disk cache. This can take some time. Archived runs are stored in the following locations. Original location Archive location /panfs/sequencers /SAY/archive/YCGA-729009-YCGA-A2/archive/panfs/sequencers /ycga-ba/ba_sequencers /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-ba/ba_sequencers /gpfs/ycga/sequencers/illumina/sequencers /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencers You can directly copy or untar the project tarfile into a scratch directory. Info Very large tar files over 500GB, sometimes fail to download. If you run into problems, contact us at hpc@yale.edu and we can manually download it. cd ~/scratch60/somedir tar \u2013xvf /SAY/archive/YCGA-729009-YCGA-A2/archive/path/to/file.tar Inside the project tar files are the fastq files, which have been compressed using quip . If your pipeline cannot read quip files directly, you will need to uncompress them before using them. module load Quip quip \u2013d M20_ACAGTG_L008_R1_009.fastq.qp For your convenience, we have a tool, restore , that will download a tar file, untar it, and uncompress all quip files. module load ycga-public restore \u2013t /SAY/archive/YCGA-729009-YCGA/archive/path/to/file.tar If you have trouble locating your files, you can use the utility locateRun , using any substring of the original run name. locateRun is in the same module as restore. locateRun C9374AN Restore spends most of the time running quip. You can parallelize and thereby speed up that process using the -n flag. restore \u2013n 20 ... Tip When retrieving data, run untar/unquip as a job on a compute node, not a login node and make sure to allocate sufficient resources to your job, e.g. \u2013c 20 --mem=100G . Tip The ycgaFastq tool can also be used to recover archived data. See here .","title":"Retrieve Data from the Archive"},{"location":"data/archived-sequencing/#example","text":"Imagine that user rdb9 wants to restore data from run BHJWZZBCX3","title":"Example:"},{"location":"data/archived-sequencing/#step-1","text":"Initialize compute node with 20 cores salloc -c 20 module load ycga-public","title":"step 1"},{"location":"data/archived-sequencing/#step-2","text":"Find the run location $ locateRun BHJWZZBCX3 /ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3.deleted /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3 Note that the original run location has been deleted, but the archive location is listed.","title":"step 2"},{"location":"data/archived-sequencing/#step-3","text":"List the contents of the archived run, and locate the desired project tarball: $ ls -1 /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3 210305_D00306_1337_BHJWZZBCX3_0.tar 210305_D00306_1337_BHJWZZBCX3_0_Unaligned_Project_Jdm222.tar 210305_D00306_1337_BHJWZZBCX3_1_Unaligned-1_Project_Rdb9.tar 210305_D00306_1337_BHJWZZBCX3_2021_05_09_04:00:36_archive.log We want 210305_D00306_1337_BHJWZZBCX3_1_Unaligned-1_Project_Rdb9.tar, matching our netid.","title":"step 3"},{"location":"data/archived-sequencing/#step-4","text":"Use the restore utility to copy and uncompress the fastq files from the tar file. By default, restore will start 20 threads, which matches our srun above. The restore will likely take several minutes. To see progress, you can use the -v flag. restore -v -t /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3/210305_D00306_1337_BHJWKHBCX3_1_Unaligned-1_Project_Rdb9.tar The restored fastq files will written to a directory like this: 210305_D00306_1337_BHJWZZBCX3/Data/Intensities/BaseCalls/Unaligned*/Project_*","title":"step 4"},{"location":"data/backups/","text":"Backups and Snapshots The only storage backed up on every cluster is Home. We do provide local snapshots, covering at least the last 2 days, on Home and Project directories (see below for details). See the individual cluster documentation for more details about which storage is backed up or has snapshots. Please see our HPC Policies page for additional information about backups. Retrieve Data from Home Backups Contact us with your netid and the list of files/directories you would like restored. For any data deleted in the last couple days, first try the self-service snapshots described below. Retrieve Data from Snapshots Our clusters create snapshots nightly on portions of the filesystem so that you can retrieve mistakenly modified or deleted files for yourself. We do not currently provide snapshots of scratch storage. As long as your files existed in the form you want them in before the most recent midnight and the deletion was in the last few days, they can probably be recovered. Snapshot directory structure mirrors the files that are being tracked with a prefix, listed in the table below. Contact us if you need assistance finding the appropriate snapshot location for your files. File set Snapshot Prefix /gpfs/gibbs/project /gpfs/gibbs/project/.snapshots /gpfs/gibbs/pi/group /gpfs/gibbs/pi/group/.snapshots /vast/palmer/home.grace /vast/palmer/home.grace/.snapshot /vast/palmer/home.mccleary /vast/palmer/home.mccleary/.snapshot /gpfs/ycga /gpfs/ycga/.snapshots /gpfs/milgram/home /gpfs/milgram/home/.snapshots /gpfs/milgram/project /gpfs/milgram/project/.snapshots /gpfs/milgram/pi/groupname /gpfs/milgram/pi/groupname/.snapshots /gpfs/slayman/pi/gerstein /gpfs/slayman/pi/gerstein/.snapshots Within the snapshot directory, you will find multiple directories with names that indicate specific dates. For example, if you wanted to recover the file /gpfs/gibbs/project/bjornson/rdb9/doit.sh (a file in the bjornson group's project directory owned by rdb9) it would be found at /gpfs/gibbs/.snapshots/date/project/bjornson/rdb9/doit.sh . Snapshot Sizes Because of the way snapshots are stored, sizes will not be correctly reported until you copy your files/directories back out of the .snapshots directory.","title":"Backups and Snapshots"},{"location":"data/backups/#backups-and-snapshots","text":"The only storage backed up on every cluster is Home. We do provide local snapshots, covering at least the last 2 days, on Home and Project directories (see below for details). See the individual cluster documentation for more details about which storage is backed up or has snapshots. Please see our HPC Policies page for additional information about backups.","title":"Backups and Snapshots"},{"location":"data/backups/#retrieve-data-from-home-backups","text":"Contact us with your netid and the list of files/directories you would like restored. For any data deleted in the last couple days, first try the self-service snapshots described below.","title":"Retrieve Data from Home Backups"},{"location":"data/backups/#retrieve-data-from-snapshots","text":"Our clusters create snapshots nightly on portions of the filesystem so that you can retrieve mistakenly modified or deleted files for yourself. We do not currently provide snapshots of scratch storage. As long as your files existed in the form you want them in before the most recent midnight and the deletion was in the last few days, they can probably be recovered. Snapshot directory structure mirrors the files that are being tracked with a prefix, listed in the table below. Contact us if you need assistance finding the appropriate snapshot location for your files. File set Snapshot Prefix /gpfs/gibbs/project /gpfs/gibbs/project/.snapshots /gpfs/gibbs/pi/group /gpfs/gibbs/pi/group/.snapshots /vast/palmer/home.grace /vast/palmer/home.grace/.snapshot /vast/palmer/home.mccleary /vast/palmer/home.mccleary/.snapshot /gpfs/ycga /gpfs/ycga/.snapshots /gpfs/milgram/home /gpfs/milgram/home/.snapshots /gpfs/milgram/project /gpfs/milgram/project/.snapshots /gpfs/milgram/pi/groupname /gpfs/milgram/pi/groupname/.snapshots /gpfs/slayman/pi/gerstein /gpfs/slayman/pi/gerstein/.snapshots Within the snapshot directory, you will find multiple directories with names that indicate specific dates. For example, if you wanted to recover the file /gpfs/gibbs/project/bjornson/rdb9/doit.sh (a file in the bjornson group's project directory owned by rdb9) it would be found at /gpfs/gibbs/.snapshots/date/project/bjornson/rdb9/doit.sh . Snapshot Sizes Because of the way snapshots are stored, sizes will not be correctly reported until you copy your files/directories back out of the .snapshots directory.","title":"Retrieve Data from Snapshots"},{"location":"data/external/","text":"Share Data Outside Yale Share data using Microsoft OneDrive Yale ITS's recommended way to send other people large files is by using Microsoft OneDrive. See details . Public Website Researchers frequently ask how they can set up a public website to share data or provide a web-based application. The easiest way to do this is by using Yale ITS's spinup service. First get an account on Spinup . Info When getting your account on Spinup, you will need to provide a charging account (aka COA). Static website You can use a static website with a public address to serve data publicly to collaborators or services that need to see the data via http. A common example of this is hosting tracks for the UCSC Genome Browser. Note that this only serves static files. If you wish to host a dynamic web application, see below. ITS's spinup service makes creating a static website easy and inexpensive. Follow their instructions on creating a static website , giving it an appropriate website name. Make sure to save the access key and secret key, since you'll need them to connect to the website. The static website will incur a small charge per month of a few cents per GB stored or downloaded. Then use an S3 transfer tool like Cyberduck, AWS CLI, or CrossFTP to connect to the website and transfer your files. The spinup page for your static website provides a link to a Cyberduck config file. That is the probably the easiest way to connect. UCSC Hub To set up the UCSC Hub, follow their directions to set up the appropriate file heirarchy on your static website, using the transfer tool. Web-based application If your web application goes beyond simply serving static data, the best solution is to create a spinup virtual machine (VM), set up your web application on the VM, then follow the spinup instructions on requesting public access to a web server Info Running a VM 24x7 can incur significant costs on spinup, depending on the size of the VM. Private Share Using Globus Globus can be used to shared data hosts on one of the clusters privately with a specific person or group of people. From the file manager interface enter the name of the endpoint you would like to share from in the collection field (e.g. yale#grace) Click the Share button on the right Click on \"Add a Shared Endpoint\" Next to Path, click \"Browse\" to find and select the directory you want to share Add other details as desired and click on \"Create Share\" Click on \"Add Permissions -- Share With\" Under \"Username or Email\" enter the e-mail address of the person that you want to share the data with, then click on \"Save\", then click on \"Add Permission\" Do not select \"write\" unless you want the person you are sharing the data with to be able to write to your storage on the cluster. For more information, please see the official Globus Documentation .","title":"Share Data Outside Yale"},{"location":"data/external/#share-data-outside-yale","text":"","title":"Share Data Outside Yale"},{"location":"data/external/#share-data-using-microsoft-onedrive","text":"Yale ITS's recommended way to send other people large files is by using Microsoft OneDrive. See details .","title":"Share data using Microsoft OneDrive"},{"location":"data/external/#public-website","text":"Researchers frequently ask how they can set up a public website to share data or provide a web-based application. The easiest way to do this is by using Yale ITS's spinup service. First get an account on Spinup . Info When getting your account on Spinup, you will need to provide a charging account (aka COA).","title":"Public Website"},{"location":"data/external/#static-website","text":"You can use a static website with a public address to serve data publicly to collaborators or services that need to see the data via http. A common example of this is hosting tracks for the UCSC Genome Browser. Note that this only serves static files. If you wish to host a dynamic web application, see below. ITS's spinup service makes creating a static website easy and inexpensive. Follow their instructions on creating a static website , giving it an appropriate website name. Make sure to save the access key and secret key, since you'll need them to connect to the website. The static website will incur a small charge per month of a few cents per GB stored or downloaded. Then use an S3 transfer tool like Cyberduck, AWS CLI, or CrossFTP to connect to the website and transfer your files. The spinup page for your static website provides a link to a Cyberduck config file. That is the probably the easiest way to connect.","title":"Static website"},{"location":"data/external/#ucsc-hub","text":"To set up the UCSC Hub, follow their directions to set up the appropriate file heirarchy on your static website, using the transfer tool.","title":"UCSC Hub"},{"location":"data/external/#web-based-application","text":"If your web application goes beyond simply serving static data, the best solution is to create a spinup virtual machine (VM), set up your web application on the VM, then follow the spinup instructions on requesting public access to a web server Info Running a VM 24x7 can incur significant costs on spinup, depending on the size of the VM.","title":"Web-based application"},{"location":"data/external/#private-share-using-globus","text":"Globus can be used to shared data hosts on one of the clusters privately with a specific person or group of people. From the file manager interface enter the name of the endpoint you would like to share from in the collection field (e.g. yale#grace) Click the Share button on the right Click on \"Add a Shared Endpoint\" Next to Path, click \"Browse\" to find and select the directory you want to share Add other details as desired and click on \"Create Share\" Click on \"Add Permissions -- Share With\" Under \"Username or Email\" enter the e-mail address of the person that you want to share the data with, then click on \"Save\", then click on \"Add Permission\" Do not select \"write\" unless you want the person you are sharing the data with to be able to write to your storage on the cluster. For more information, please see the official Globus Documentation .","title":"Private Share Using Globus"},{"location":"data/globus/","text":"Large Transfers with Globus For large data transfers both within Yale and to external collaborators, we recommend using Globus. Globus is a file transfer service that is efficient and easy to use. It has several advantages: Robust and fast transfers of large files and/or large collections of files. Files can be transferred between your computer and the clusters. Files can be transferred between Yale and other sites. A web and command-line interface for starting and monitoring transfers. Access to specific files or directories granted to external collaborators in a secure way. Globus transfers data between computers set up as \"endpoints\". The official YCRC endpoints are listed below. Transfers can be to and from these endpoints or those you have defined for yourself with Globus Connect . Course Accounts Globus does not work for course accounts ( _ ). Please try the other transfer methods listed in our Transfer documentation instead. Cluster Endpoints We currently support endpoints for the following clusters. Cluster Globus Endpoint Grace yale#grace McCleary Yale CRC McCleary Milgram Yale CRC Milgram For Grace and McCleary, these endpoints provide access to all files you normally have access to. For security reasons, Milgram Globus uses a staging area ( /gpfs/milgram/globus/$NETID ). Once uploaded, data should be moved from this staging area to its final location within Milgram. Files in the staging area are purged after 21 days. Get Started with Globus In a browser, go to app.globus.org . Use the pull-down menu to select Yale and click \"Continue\". If you are not already logged into CAS, you will be prompted to log in. [First login only] Do not associate with another account yet unless you are familiar with doing this [First login only] Select \"non-profit research or educational purposes\" [First login only] Click on \"Allow\" for allowing Globus Web App From the file manager interface enter the name of the endpoint you would like to browse in the collection field (e.g. yale#grace) Click on the right-hand side menu option \"Transfer or Sync to...\" Enter the second endpoint name in the right search box (e.g. another cluster or your personal endpoint) Select one or more files you would like to transfer and click the appropriate start button on the bottom. To complete a partial transfer, you can click the \"sync\" checkbox in the Transfer Setting window on the Globus page, and hten Globus should resume the transfer where it left off. Manage Your Endpoints To manage your endpoints, such as delete an endpoint, rename it, or share it with additional people (be aware, they will be able to access your storage), go to Manage Endpoint on the Globus website. Setup an Endpoint on Your Computer You can set up your own endpoint for transferring data to and from your own computer with Globus Connect Personal . To transfer or share data between two personal endpoints, you will need to request access to the YCRC's Globus Plus subscription on this page . Setup a Google Drive Endpoint The Globus connector is configured to only allow data to be uploaded into EliApps (Yale's GSuite for Education) Google Drive accounts. If you don't have an EliApps account, request one as described above. To set up your Globus Google Drive endpoint, click on the following link: Setup Globus Google Drive Endpoint Log into Globus, if needed. The first time you login to the Globus Google Drive endpoint, you will be presented with a permissions approval page. If you are ok with the Connector manipulating your files through Globus (which is required), click the Allow button. You may see your Yale EliApps account expressed in an uncommon format, such as netid@yale.edu@accounts.google.com. This is normal, and expected. After your approvals you will be directed to the Globus File Manager, with the default view of \"/My Drive\". To see \"/Team Drives\" and other Google Drive features use the \"up one folder\" arrow icon in the File Manager. To transfer to or from your Google Drive, search in the Collection field for \"YCRC Globus Google Drive Collection\". Note There are \"rate limits\" to how much data and how many files you can transfer in any 24 hours period. If you have hit your rate limit, Globus should automatically resume the transfer during the next 24 hour period. You see a \"Endpoint Busy\" error during this time. Google has a 400,000 file limit per Shared Drive , so if you are archiving data to Google Drive, it is better to compress folders that contain lots of small files (e.g. using tar ) before transferring. In our testing, we have seen up to 10MB/s upload and 100MB/s download speeds. Setup a S3 Endpoint We support creating Globus S3 endpoints. To request a Globus S3 Endpoint, please contact YCRC . Please include in your request: S3 bucket name The Amazon Region for that bucket An initial list of Yale NetIDs who should be able to access the bucket Warning Please DO NOT send us the Amazon login credentials through an insecure method such as email or our ticketing system. After we have created your Globus S3 endpoint, you will be able to further self-serve you own access controls with the Globus portal.","title":"Large Transfers with Globus"},{"location":"data/globus/#large-transfers-with-globus","text":"For large data transfers both within Yale and to external collaborators, we recommend using Globus. Globus is a file transfer service that is efficient and easy to use. It has several advantages: Robust and fast transfers of large files and/or large collections of files. Files can be transferred between your computer and the clusters. Files can be transferred between Yale and other sites. A web and command-line interface for starting and monitoring transfers. Access to specific files or directories granted to external collaborators in a secure way. Globus transfers data between computers set up as \"endpoints\". The official YCRC endpoints are listed below. Transfers can be to and from these endpoints or those you have defined for yourself with Globus Connect . Course Accounts Globus does not work for course accounts ( _ ). Please try the other transfer methods listed in our Transfer documentation instead.","title":"Large Transfers with Globus"},{"location":"data/globus/#cluster-endpoints","text":"We currently support endpoints for the following clusters. Cluster Globus Endpoint Grace yale#grace McCleary Yale CRC McCleary Milgram Yale CRC Milgram For Grace and McCleary, these endpoints provide access to all files you normally have access to. For security reasons, Milgram Globus uses a staging area ( /gpfs/milgram/globus/$NETID ). Once uploaded, data should be moved from this staging area to its final location within Milgram. Files in the staging area are purged after 21 days.","title":"Cluster Endpoints"},{"location":"data/globus/#get-started-with-globus","text":"In a browser, go to app.globus.org . Use the pull-down menu to select Yale and click \"Continue\". If you are not already logged into CAS, you will be prompted to log in. [First login only] Do not associate with another account yet unless you are familiar with doing this [First login only] Select \"non-profit research or educational purposes\" [First login only] Click on \"Allow\" for allowing Globus Web App From the file manager interface enter the name of the endpoint you would like to browse in the collection field (e.g. yale#grace) Click on the right-hand side menu option \"Transfer or Sync to...\" Enter the second endpoint name in the right search box (e.g. another cluster or your personal endpoint) Select one or more files you would like to transfer and click the appropriate start button on the bottom. To complete a partial transfer, you can click the \"sync\" checkbox in the Transfer Setting window on the Globus page, and hten Globus should resume the transfer where it left off.","title":"Get Started with Globus"},{"location":"data/globus/#manage-your-endpoints","text":"To manage your endpoints, such as delete an endpoint, rename it, or share it with additional people (be aware, they will be able to access your storage), go to Manage Endpoint on the Globus website.","title":"Manage Your Endpoints"},{"location":"data/globus/#setup-an-endpoint-on-your-computer","text":"You can set up your own endpoint for transferring data to and from your own computer with Globus Connect Personal . To transfer or share data between two personal endpoints, you will need to request access to the YCRC's Globus Plus subscription on this page .","title":"Setup an Endpoint on Your Computer"},{"location":"data/globus/#setup-a-google-drive-endpoint","text":"The Globus connector is configured to only allow data to be uploaded into EliApps (Yale's GSuite for Education) Google Drive accounts. If you don't have an EliApps account, request one as described above. To set up your Globus Google Drive endpoint, click on the following link: Setup Globus Google Drive Endpoint Log into Globus, if needed. The first time you login to the Globus Google Drive endpoint, you will be presented with a permissions approval page. If you are ok with the Connector manipulating your files through Globus (which is required), click the Allow button. You may see your Yale EliApps account expressed in an uncommon format, such as netid@yale.edu@accounts.google.com. This is normal, and expected. After your approvals you will be directed to the Globus File Manager, with the default view of \"/My Drive\". To see \"/Team Drives\" and other Google Drive features use the \"up one folder\" arrow icon in the File Manager. To transfer to or from your Google Drive, search in the Collection field for \"YCRC Globus Google Drive Collection\". Note There are \"rate limits\" to how much data and how many files you can transfer in any 24 hours period. If you have hit your rate limit, Globus should automatically resume the transfer during the next 24 hour period. You see a \"Endpoint Busy\" error during this time. Google has a 400,000 file limit per Shared Drive , so if you are archiving data to Google Drive, it is better to compress folders that contain lots of small files (e.g. using tar ) before transferring. In our testing, we have seen up to 10MB/s upload and 100MB/s download speeds.","title":"Setup a Google Drive Endpoint"},{"location":"data/globus/#setup-a-s3-endpoint","text":"We support creating Globus S3 endpoints. To request a Globus S3 Endpoint, please contact YCRC . Please include in your request: S3 bucket name The Amazon Region for that bucket An initial list of Yale NetIDs who should be able to access the bucket Warning Please DO NOT send us the Amazon login credentials through an insecure method such as email or our ticketing system. After we have created your Globus S3 endpoint, you will be able to further self-serve you own access controls with the Globus portal.","title":"Setup a S3 Endpoint"},{"location":"data/glossary/","text":"Glossary To help clarify the way we refer to certain terms in our user documentation, here is a brief list of some of the words that regularly come up in our documents. Please reach out to us at hpc@yale.edu if there are any other words that need to be added. Account - used to authenticate and grant permission to access resources Account (Slurm) - an accounting mechanism to keep track of a group's computing usage Activate - making something operational Array - a data structure across a series of memory locations consisting of elements organized in an index Array (job) - a series of jobs that all request the same resources and run the same batch script Array Task ID - a unique sequential number with an appended number that refers to an individual task within the set of submitted jobs Channel - Community-led collections of packages created by a group or lab installed with conda to allow for a homogenous environment across systems CLI - Command Line Interface processes commands to a computer program in the form of lines of text in a window Cluster - a set of computers, called nodes) networked together so nodes can perform the tasks facilitated by a scheduling software Command - a specific order from a computer to execute a service with either an application or the operating system Compute Node - the nodes that work runs on to perform computational work Container - A stack of software, libraries and operating system that is independent of the host computer and can be accessed on other computers Container Image - Self-contained read-only files used to run applications CPU - Central Processing Units are the components of a system that perform basic operations and exchange data with the system\u2019s memory (also known as a processor) Data - items of information collected together for reference or analysis Database - a collection of structured data held within a computer Deactivate - making something de-operational Environment - a collection of hardware, software, data storage and networks that work together in facilitating the processing and exchange of information Extension - Suffix at the end of a filename to indicate the file type Fileset - a section of a storage device that is given a designated purpose Filesystem - a process that manages how and where data is stored Flag - (see Options) GPU - Graphics Processing Units are specialized circuits designed to rapidly manipulate memory and create images in a frame buffer for a displayed output GridFTP - an extension of the Fire Transfer Protocol for grid computing that allows users to transfer and save data on a different account such as Google Drive or other off network memory Group - a collection of users who can all be given the same permissions on a system GUI - Graphical User Interface allows users to interact with devices and applications through a visual window that can commonly display icons and predetermined fields Hardware - the physical parts of a computer Host - (ie. Host Computer) A device connected to a computer network that offers resources, services and applications to users on the network Image - (See Container Image) Index - a method of sorting data by creating keywords or a listing of data Interface - a boundary across which two or more computer system components can exchange information Job - a unit of work given to an operating system by a scheduler Job Array - a way to submit multiple similar jobs by associating each subjob with an index value based on an array task id Key - a variable value applied using an algorithm to a block of unencrypted text to produce encrypted text Load - transfer a program or data into memory or into the CPU Login Node - a node that users log in on to access the cluster Memory - (see RAM) Metadata - A set of data that describes and gives basic information about other data Module - a number of distinct but interrelated units that build up or into a program MPI - Message Passing Interface is a standardized and portable message-passing standard designed to function on parallel computing architectures Multiprocessing - the ability to operate more than one task simultaneously on the same program across two or more processors in a computer Node - a server in the cluster Option - a single letter or full word that modifies the behavior of a command in a predetermined way (also known as a flag or switch) Package - a collection of hardware and software needed to create a working system Parallel - (ex. Computing/Programming) Architecture in which several processes are carried out simultaneously across smaller, independent parts Partition - a section of a storage device that is given a designated purpose Partition (Slurm) - a collection of compute nodes available via the scheduler Path - A string of characters used to identify locations throughout a directory structure Pane - (Associate with window) A subdivision within a window where an independent terminal can run simultaneously alongside another terminal Processor - (see CPU) Queue - a sequence of objects arranged according to priority waiting to be processed RAM - Random Access Memory, also known as \"Memory\" can be read and changed in any order and is typically used to to store working data Reproducibility - the ability to execute the same results across multiple systems by different individuals using the same data Scheduler - the software used to assign resources to a job for tasks Scheduling - the act of assigning resources to a task through a software product Session - a temporary information exchange between two or more devices SSH - secure shell is a cryptographic network protocol for operating network services securely over an unsecured network Software - a collection of data and instructions that tell a computer how to operate Switch - (see Options) System - a set of integrated hardware and software that input, output, process, and store data and information Task ID - a unique sequential number used to refer to a task Terminal - Referring to a terminal program, a text-based interface for typing commands Toolchain - a set of tools performing individual actions used in delivering an operation Unload - remove a program or data from memory or out of the CPU User - a person interacting and utilizing a computing service Variable - assigned and referenced data values that can be called within a program and changed depending on how the program runs Window - (Associate with pane) the whole screen being displayed, containing subdivisions, or panes, that can run independent terminals alongside each other","title":"Glossary"},{"location":"data/glossary/#glossary","text":"To help clarify the way we refer to certain terms in our user documentation, here is a brief list of some of the words that regularly come up in our documents. Please reach out to us at hpc@yale.edu if there are any other words that need to be added. Account - used to authenticate and grant permission to access resources Account (Slurm) - an accounting mechanism to keep track of a group's computing usage Activate - making something operational Array - a data structure across a series of memory locations consisting of elements organized in an index Array (job) - a series of jobs that all request the same resources and run the same batch script Array Task ID - a unique sequential number with an appended number that refers to an individual task within the set of submitted jobs Channel - Community-led collections of packages created by a group or lab installed with conda to allow for a homogenous environment across systems CLI - Command Line Interface processes commands to a computer program in the form of lines of text in a window Cluster - a set of computers, called nodes) networked together so nodes can perform the tasks facilitated by a scheduling software Command - a specific order from a computer to execute a service with either an application or the operating system Compute Node - the nodes that work runs on to perform computational work Container - A stack of software, libraries and operating system that is independent of the host computer and can be accessed on other computers Container Image - Self-contained read-only files used to run applications CPU - Central Processing Units are the components of a system that perform basic operations and exchange data with the system\u2019s memory (also known as a processor) Data - items of information collected together for reference or analysis Database - a collection of structured data held within a computer Deactivate - making something de-operational Environment - a collection of hardware, software, data storage and networks that work together in facilitating the processing and exchange of information Extension - Suffix at the end of a filename to indicate the file type Fileset - a section of a storage device that is given a designated purpose Filesystem - a process that manages how and where data is stored Flag - (see Options) GPU - Graphics Processing Units are specialized circuits designed to rapidly manipulate memory and create images in a frame buffer for a displayed output GridFTP - an extension of the Fire Transfer Protocol for grid computing that allows users to transfer and save data on a different account such as Google Drive or other off network memory Group - a collection of users who can all be given the same permissions on a system GUI - Graphical User Interface allows users to interact with devices and applications through a visual window that can commonly display icons and predetermined fields Hardware - the physical parts of a computer Host - (ie. Host Computer) A device connected to a computer network that offers resources, services and applications to users on the network Image - (See Container Image) Index - a method of sorting data by creating keywords or a listing of data Interface - a boundary across which two or more computer system components can exchange information Job - a unit of work given to an operating system by a scheduler Job Array - a way to submit multiple similar jobs by associating each subjob with an index value based on an array task id Key - a variable value applied using an algorithm to a block of unencrypted text to produce encrypted text Load - transfer a program or data into memory or into the CPU Login Node - a node that users log in on to access the cluster Memory - (see RAM) Metadata - A set of data that describes and gives basic information about other data Module - a number of distinct but interrelated units that build up or into a program MPI - Message Passing Interface is a standardized and portable message-passing standard designed to function on parallel computing architectures Multiprocessing - the ability to operate more than one task simultaneously on the same program across two or more processors in a computer Node - a server in the cluster Option - a single letter or full word that modifies the behavior of a command in a predetermined way (also known as a flag or switch) Package - a collection of hardware and software needed to create a working system Parallel - (ex. Computing/Programming) Architecture in which several processes are carried out simultaneously across smaller, independent parts Partition - a section of a storage device that is given a designated purpose Partition (Slurm) - a collection of compute nodes available via the scheduler Path - A string of characters used to identify locations throughout a directory structure Pane - (Associate with window) A subdivision within a window where an independent terminal can run simultaneously alongside another terminal Processor - (see CPU) Queue - a sequence of objects arranged according to priority waiting to be processed RAM - Random Access Memory, also known as \"Memory\" can be read and changed in any order and is typically used to to store working data Reproducibility - the ability to execute the same results across multiple systems by different individuals using the same data Scheduler - the software used to assign resources to a job for tasks Scheduling - the act of assigning resources to a task through a software product Session - a temporary information exchange between two or more devices SSH - secure shell is a cryptographic network protocol for operating network services securely over an unsecured network Software - a collection of data and instructions that tell a computer how to operate Switch - (see Options) System - a set of integrated hardware and software that input, output, process, and store data and information Task ID - a unique sequential number used to refer to a task Terminal - Referring to a terminal program, a text-based interface for typing commands Toolchain - a set of tools performing individual actions used in delivering an operation Unload - remove a program or data from memory or out of the CPU User - a person interacting and utilizing a computing service Variable - assigned and referenced data values that can be called within a program and changed depending on how the program runs Window - (Associate with pane) the whole screen being displayed, containing subdivisions, or panes, that can run independent terminals alongside each other","title":"Glossary"},{"location":"data/google-drive/","text":"Google Drive Through Yale Google Apps for Education (EliApps), researchers have access to 5GB of storage with the option to purchase additional storage as needed. The Globus Google Drive connector allows you to create a Globus endpoint that allows you to use the Globus infrastructure to transfer data into your Google Drive account. As always, no sensitive data (e.g. ePHI, HIPAA) is allowed in Google Drive storage. EliApps If your Yale email account is already an EliApps account (Gmail), then you are all set. If your Yale email is in Microsoft Office365, send an email to the ITS helpdesk requesting a \"no-email EliApps account\". Once it is created you can login to Google Drive using your EliApps account name, which will be of the form netid@yale.edu . The Globus connector is configured to only allow data to be uploaded into EliApps Google Drive accounts. Google Shared Drives (formerly Team Drive) Shared Drives is an additional feature for EliApps that is available by request only (at the moment). A Shared Drive is a Google Drive space that solves a lot of ownership and permissions issues present with traditional shared Google Drive folder. Once you create a Shared Drive, e.g. for a project or research group, any data placed in that Drive are owned by the drive and the permission (which accounts can own or access the data) can be easily managed from the Shared Drive interface by drive owners. With Shared Drive, you can be sure the data will stay with research group as students and postdocs come and go. If your group already uses Google Drive, contact us if you need additional Shared Drives. Although group members are limited to a default of 5GB of EliApps Storage, this can be increased as needed by reaching out through the Yale ITS Google Shared page . Aside from these quota limits, there are also limits for Google Shared Drives put in place by Google directly. Some are listed below. Warning To keep file counts low (and for easier data retrieval) we highly recommended that you archive your data using zip or tar . Limit type Limit Number of files and folders 400,000 Daily upload cap 750 GiB Max individual file size 5 TiB Max number of nested folders 20 Local File Access You can upload and access your data using the web portal and sync data with your local machines via the Google File Stream software. For sync with your local machine, install Drive for desktop . Authenticate with your EliApps account and you will see Google Drive mounted as an additional drive on your machine. Rclone You can also transfer data using the command line utility Rclone . Rclone can be used to transfer data to any Google Drive account. Globus Google Drive Connector You can use Globus to transfer data to/from any EliApps Google Drive as well. See our Globus documentation for more information.","title":"Google Drive"},{"location":"data/google-drive/#google-drive","text":"Through Yale Google Apps for Education (EliApps), researchers have access to 5GB of storage with the option to purchase additional storage as needed. The Globus Google Drive connector allows you to create a Globus endpoint that allows you to use the Globus infrastructure to transfer data into your Google Drive account. As always, no sensitive data (e.g. ePHI, HIPAA) is allowed in Google Drive storage.","title":"Google Drive"},{"location":"data/google-drive/#eliapps","text":"If your Yale email account is already an EliApps account (Gmail), then you are all set. If your Yale email is in Microsoft Office365, send an email to the ITS helpdesk requesting a \"no-email EliApps account\". Once it is created you can login to Google Drive using your EliApps account name, which will be of the form netid@yale.edu . The Globus connector is configured to only allow data to be uploaded into EliApps Google Drive accounts.","title":"EliApps"},{"location":"data/google-drive/#google-shared-drives-formerly-team-drive","text":"Shared Drives is an additional feature for EliApps that is available by request only (at the moment). A Shared Drive is a Google Drive space that solves a lot of ownership and permissions issues present with traditional shared Google Drive folder. Once you create a Shared Drive, e.g. for a project or research group, any data placed in that Drive are owned by the drive and the permission (which accounts can own or access the data) can be easily managed from the Shared Drive interface by drive owners. With Shared Drive, you can be sure the data will stay with research group as students and postdocs come and go. If your group already uses Google Drive, contact us if you need additional Shared Drives. Although group members are limited to a default of 5GB of EliApps Storage, this can be increased as needed by reaching out through the Yale ITS Google Shared page . Aside from these quota limits, there are also limits for Google Shared Drives put in place by Google directly. Some are listed below. Warning To keep file counts low (and for easier data retrieval) we highly recommended that you archive your data using zip or tar . Limit type Limit Number of files and folders 400,000 Daily upload cap 750 GiB Max individual file size 5 TiB Max number of nested folders 20","title":"Google Shared Drives (formerly Team Drive)"},{"location":"data/google-drive/#local-file-access","text":"You can upload and access your data using the web portal and sync data with your local machines via the Google File Stream software. For sync with your local machine, install Drive for desktop . Authenticate with your EliApps account and you will see Google Drive mounted as an additional drive on your machine.","title":"Local File Access"},{"location":"data/google-drive/#rclone","text":"You can also transfer data using the command line utility Rclone . Rclone can be used to transfer data to any Google Drive account.","title":"Rclone"},{"location":"data/google-drive/#globus-google-drive-connector","text":"You can use Globus to transfer data to/from any EliApps Google Drive as well. See our Globus documentation for more information.","title":"Globus Google Drive Connector"},{"location":"data/group-change/","text":"Group Change When your PI is changed, the primary group of your account on the cluster will also be changed. As a result, you will have a new storage space on the cluster which belongs to the new group, including Home, Project, Scratch, etc. We will change the primary group of your cluster account to the new group and will move all the files stored in your old storage space into the new storage space. However, some local installations most likely will not be able to work properly after being moved. In particular, Conda environments and R packages will fail. You need to rebuild them in your new space under the new group. For R packages, you just need to reinstall them with install.packages() . Rebuild a Conda Environment after Group Change We will use an example to illustrate how to rebuild a conda env after group change. Assume the conda env is originally installed in /gpfs/gibbs/project/oldgrp/user123 , and we want to move it to the project directory of the new group. First, find the paths of the conda env stored in your old space that you want to rebuild in the new space. Set two environment variables CONDA_ENVS_PATH and CONDA_PKGS_DIRS to the paths. module load miniconda export CONDA_ENVS_PATH = /gpfs/gibbs/project/oldgrp/user123/conda_envs export CONDA_PKGS_DIRS = /gpfs/gibbs/project/oldgrp/user123/conda_pkgs conda activate myenv conda env export > myenv.yml conda deactivate Now, start a new login session, submit an interactive job, and rebuild the conda env in your new storage space. When a new session starts, CONDA_ENVS_PATH and CONDA_PKGS_DIRS will be set to the right locations by the system, so you don't have to set them explicitly. ssh grace salloc module load miniconda conda env create -f myenv.yml","title":"Group Change"},{"location":"data/group-change/#group-change","text":"When your PI is changed, the primary group of your account on the cluster will also be changed. As a result, you will have a new storage space on the cluster which belongs to the new group, including Home, Project, Scratch, etc. We will change the primary group of your cluster account to the new group and will move all the files stored in your old storage space into the new storage space. However, some local installations most likely will not be able to work properly after being moved. In particular, Conda environments and R packages will fail. You need to rebuild them in your new space under the new group. For R packages, you just need to reinstall them with install.packages() .","title":"Group Change"},{"location":"data/group-change/#rebuild-a-conda-environment-after-group-change","text":"We will use an example to illustrate how to rebuild a conda env after group change. Assume the conda env is originally installed in /gpfs/gibbs/project/oldgrp/user123 , and we want to move it to the project directory of the new group. First, find the paths of the conda env stored in your old space that you want to rebuild in the new space. Set two environment variables CONDA_ENVS_PATH and CONDA_PKGS_DIRS to the paths. module load miniconda export CONDA_ENVS_PATH = /gpfs/gibbs/project/oldgrp/user123/conda_envs export CONDA_PKGS_DIRS = /gpfs/gibbs/project/oldgrp/user123/conda_pkgs conda activate myenv conda env export > myenv.yml conda deactivate Now, start a new login session, submit an interactive job, and rebuild the conda env in your new storage space. When a new session starts, CONDA_ENVS_PATH and CONDA_PKGS_DIRS will be set to the right locations by the system, so you don't have to set them explicitly. ssh grace salloc module load miniconda conda env create -f myenv.yml","title":"Rebuild a Conda Environment after Group Change"},{"location":"data/hpc-storage/","text":"HPC Storage Along with access to the compute clusters we provide each research group with cluster storage space for research data. The storage is separated into three quotas: Home, Project, and 60-day Scratch. Each of these quotas limit both the amount in bytes and number of files you can store. Hitting your quota stops you from being able to write data, and can cause jobs to fail . You can monitor your storage usage by running the getquota command on a cluster. No sensitive data can be stored on any cluster storage, except for Milgram . Backups The only storage backed up on every cluster is Home. We do provide local snapshots, covering at least the last 2 days, on Home and Project directories (see below for details). Please see our HPC Policies page for additional information about backups. Storage Spaces For an overview of which filesystems are mounted on each cluster, see the HPC Resources documentation. Home Quota: 125 GiB and 500,000 files per person Your home directory is where your sessions begin by default. Its intended use is for storing scripts, notes, final products (e.g. figures), etc. Its path is /home/netid (where netid is your Yale netid) on every cluster. Home storage is backed up daily. If you would like to restore files, please contact us with your netid and the list of files/directories you would like restored. Project Quota: 1 TiB and 5,000,000 files per group, expanded to 4 TiB on request Project storage is shared among all members of a specific group. Project storage is not backed up , so we strongly recommend that you have a second copy somewhere off-cluster of any valuable data you have stored in project. You can access this space through a symlink, or shortcut, in your home directory called project . See our Sharing Data documentation for instructions on sharing data in your project space with other users. Project quotas are global to the whole project space, so if the group ownership on a file is your group, it will count towards your quota, regardless of its location within project . This can occasionally create confusion for users who belong to multiple groups and they need to be mindful of which files are owned by which of their group affiliations to ensure proper accounting. Purchased Storage Quota: varies Storage purchased for the dedicated use by a single group or collection of groups provides similar functionality as project storage and is also not backed up. See below for details on purchasing storage. Purchased storage, if applicable, is located on the Gibbs filesystem in a /gpfs/gibbs/pi/ directory under the group's name. Unlike project space described above, all files in your purchased storage count towards your quotas, regardless of file ownership. All purchased storage 60-Day Scratch Quota: 10 TiB and 15,000,000 files per group 60-day scratch is intended to be used for storing temporary data. Any file in this space older than 60 days will automatically be deleted. We send out a weekly warning about files we expect to delete the following week. Like project, scratch quota is shared by your entire research group. If we begin to run low on storage, you may be asked to delete files younger than 60 days old. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage. You can access this space through a symlink, or shortcut, in your home directory called palmer_scratch (or scratch60 on Milgram ). See our Sharing Data documentation for instructions on sharing data in your scratch space with other users. Check Your Usage and Quotas To inspect your current usage, run the command getquota . Here is an example output of the command: This script shows information about your quotas on grace. If you plan to poll this sort of information extensively, please contact us for help at hpc@yale.edu ## Usage Details for support (as of Jan 25 2023 12:00) Fileset User Usage (GiB) File Count ---------------------- ----- ---------- ------------- gibbs:project ahs3 568 121,786 gibbs:project kln26 435 423,219 gibbs:project ms725 233 456,736 gibbs:project pl543 427 1,551,959 gibbs:project rdb9 1952 1,049,346 gibbs:project tl397 605 2,573,824 ---- gibbs:pi_support ahs3 0 1 gibbs:pi_support kln26 5886 14,514,143 gibbs:pi_support ms725 19651 2,692,158 gibbs:pi_support pl543 328 142,936 gibbs:pi_support rdb9 1047 165,553 gibbs:pi_support tl397 175 118,038 ## Quota Summary for support (as of right now [*palmer stats are gathered once a day]) Fileset Type Usage (GiB) Quota (GiB) File Count File Limit Backup Purged ---------------------- ------- ------------ ----------- ------------- ------------- --------- --------- palmer:home.grace USR 63 125 216,046 500,000 Yes No gibbs:project GRP 3832 10240 3,350,198 10,000,000 No No palmer:scratch GRP 0 10240 903 15,000,000 No 60 days gibbs:pi_support FILESET 27240 30720 17,647,694 22,000,000 No No The per-user breakdown is only generated periodically, and the summary at the bottom is close to real-time. Purchased storage allocations will only appear in the getquota output for users who have data in that directory. Purchase Additional Storage For long-term allocations, additional project storage spaces can be purchased on our Gibbs filesystem, which provides similar functionality to the primary project storage. This storage currently costs $200/TiB (minimum of 10 TiB, with exact pricing to be confirmed before a purchase is made). The price covers all costs, including administration, power, cooling, networking, etc. YCRC commits to making the storage available for 5 years from the purchase date, after which the storage allocation will need to be renewed, or the allocation will expire and be removed (see Storage Expiration Policy ). For shorter-term or smaller allocations, we have a monthly billing option. More details on this option can be found here (CAS login required). Please note that, as with existing project storage, purchased storage will not be backed up, so you should make arrangements for the safekeeping of critical files off the clusters. Please contact us with your requirements and budget to start the purchasing process. Purchased storage, as with all storage allocations, are subject to corresponding file count limit to preserve the health of the shared storage system. The file count limits for different size allocations are listed above. If you need additional files beyond your limit, contact us to discuss as increases may be granted on a case-by-case basis and at the YCRC's discretion. Allocation Quota File Count Limit < 50 TiB 10 million 50-99 TiB 20 million 100-499 TiB 40 million 500-999 TiB 50 million >= 1 PiB 75 million HPC Storage Best Practices Stage Data Large datasets are often stored off-cluster on departmental servers, Storage@Yale, in cloud storage, etc. If these data are too large to fit in your current quotas and you do not plan on purchasing more storage (see above), you must 'stage' your data. Since the permanent copy of the data remains on off-cluster storage, you can transfer a working copy to palmer_scratch , for example. Both Grace and McCleary have dedicated transfer partitions where you can submit long-running transfer jobs. When your computation finishes you can remove the copy and transmit or copy results to a permanent location. Please see the Staging Data documentation for more details and examples. Prevent Large Numbers of Small Files The parallel filesystems the clusters use perform poorly with very large numbers of small files. This is one reason we enforce file count quotas. If you are running an application that unavoidably make large numbers of files, do what you can to reduce file creation. Additionally you can reduce load on the filesystem by spreading the files across multiple subdirectories. Delete unneeded files between jobs and compress or archive collections of files.","title":"HPC Storage"},{"location":"data/hpc-storage/#hpc-storage","text":"Along with access to the compute clusters we provide each research group with cluster storage space for research data. The storage is separated into three quotas: Home, Project, and 60-day Scratch. Each of these quotas limit both the amount in bytes and number of files you can store. Hitting your quota stops you from being able to write data, and can cause jobs to fail . You can monitor your storage usage by running the getquota command on a cluster. No sensitive data can be stored on any cluster storage, except for Milgram . Backups The only storage backed up on every cluster is Home. We do provide local snapshots, covering at least the last 2 days, on Home and Project directories (see below for details). Please see our HPC Policies page for additional information about backups.","title":"HPC Storage"},{"location":"data/hpc-storage/#storage-spaces","text":"For an overview of which filesystems are mounted on each cluster, see the HPC Resources documentation.","title":"Storage Spaces"},{"location":"data/hpc-storage/#home","text":"Quota: 125 GiB and 500,000 files per person Your home directory is where your sessions begin by default. Its intended use is for storing scripts, notes, final products (e.g. figures), etc. Its path is /home/netid (where netid is your Yale netid) on every cluster. Home storage is backed up daily. If you would like to restore files, please contact us with your netid and the list of files/directories you would like restored.","title":"Home"},{"location":"data/hpc-storage/#project","text":"Quota: 1 TiB and 5,000,000 files per group, expanded to 4 TiB on request Project storage is shared among all members of a specific group. Project storage is not backed up , so we strongly recommend that you have a second copy somewhere off-cluster of any valuable data you have stored in project. You can access this space through a symlink, or shortcut, in your home directory called project . See our Sharing Data documentation for instructions on sharing data in your project space with other users. Project quotas are global to the whole project space, so if the group ownership on a file is your group, it will count towards your quota, regardless of its location within project . This can occasionally create confusion for users who belong to multiple groups and they need to be mindful of which files are owned by which of their group affiliations to ensure proper accounting.","title":"Project"},{"location":"data/hpc-storage/#purchased-storage","text":"Quota: varies Storage purchased for the dedicated use by a single group or collection of groups provides similar functionality as project storage and is also not backed up. See below for details on purchasing storage. Purchased storage, if applicable, is located on the Gibbs filesystem in a /gpfs/gibbs/pi/ directory under the group's name. Unlike project space described above, all files in your purchased storage count towards your quotas, regardless of file ownership. All purchased storage","title":"Purchased Storage"},{"location":"data/hpc-storage/#60-day-scratch","text":"Quota: 10 TiB and 15,000,000 files per group 60-day scratch is intended to be used for storing temporary data. Any file in this space older than 60 days will automatically be deleted. We send out a weekly warning about files we expect to delete the following week. Like project, scratch quota is shared by your entire research group. If we begin to run low on storage, you may be asked to delete files younger than 60 days old. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage. You can access this space through a symlink, or shortcut, in your home directory called palmer_scratch (or scratch60 on Milgram ). See our Sharing Data documentation for instructions on sharing data in your scratch space with other users.","title":"60-Day Scratch"},{"location":"data/hpc-storage/#check-your-usage-and-quotas","text":"To inspect your current usage, run the command getquota . Here is an example output of the command: This script shows information about your quotas on grace. If you plan to poll this sort of information extensively, please contact us for help at hpc@yale.edu ## Usage Details for support (as of Jan 25 2023 12:00) Fileset User Usage (GiB) File Count ---------------------- ----- ---------- ------------- gibbs:project ahs3 568 121,786 gibbs:project kln26 435 423,219 gibbs:project ms725 233 456,736 gibbs:project pl543 427 1,551,959 gibbs:project rdb9 1952 1,049,346 gibbs:project tl397 605 2,573,824 ---- gibbs:pi_support ahs3 0 1 gibbs:pi_support kln26 5886 14,514,143 gibbs:pi_support ms725 19651 2,692,158 gibbs:pi_support pl543 328 142,936 gibbs:pi_support rdb9 1047 165,553 gibbs:pi_support tl397 175 118,038 ## Quota Summary for support (as of right now [*palmer stats are gathered once a day]) Fileset Type Usage (GiB) Quota (GiB) File Count File Limit Backup Purged ---------------------- ------- ------------ ----------- ------------- ------------- --------- --------- palmer:home.grace USR 63 125 216,046 500,000 Yes No gibbs:project GRP 3832 10240 3,350,198 10,000,000 No No palmer:scratch GRP 0 10240 903 15,000,000 No 60 days gibbs:pi_support FILESET 27240 30720 17,647,694 22,000,000 No No The per-user breakdown is only generated periodically, and the summary at the bottom is close to real-time. Purchased storage allocations will only appear in the getquota output for users who have data in that directory.","title":"Check Your Usage and Quotas"},{"location":"data/hpc-storage/#purchase-additional-storage","text":"For long-term allocations, additional project storage spaces can be purchased on our Gibbs filesystem, which provides similar functionality to the primary project storage. This storage currently costs $200/TiB (minimum of 10 TiB, with exact pricing to be confirmed before a purchase is made). The price covers all costs, including administration, power, cooling, networking, etc. YCRC commits to making the storage available for 5 years from the purchase date, after which the storage allocation will need to be renewed, or the allocation will expire and be removed (see Storage Expiration Policy ). For shorter-term or smaller allocations, we have a monthly billing option. More details on this option can be found here (CAS login required). Please note that, as with existing project storage, purchased storage will not be backed up, so you should make arrangements for the safekeeping of critical files off the clusters. Please contact us with your requirements and budget to start the purchasing process. Purchased storage, as with all storage allocations, are subject to corresponding file count limit to preserve the health of the shared storage system. The file count limits for different size allocations are listed above. If you need additional files beyond your limit, contact us to discuss as increases may be granted on a case-by-case basis and at the YCRC's discretion. Allocation Quota File Count Limit < 50 TiB 10 million 50-99 TiB 20 million 100-499 TiB 40 million 500-999 TiB 50 million >= 1 PiB 75 million","title":"Purchase Additional Storage"},{"location":"data/hpc-storage/#hpc-storage-best-practices","text":"","title":"HPC Storage Best Practices"},{"location":"data/hpc-storage/#stage-data","text":"Large datasets are often stored off-cluster on departmental servers, Storage@Yale, in cloud storage, etc. If these data are too large to fit in your current quotas and you do not plan on purchasing more storage (see above), you must 'stage' your data. Since the permanent copy of the data remains on off-cluster storage, you can transfer a working copy to palmer_scratch , for example. Both Grace and McCleary have dedicated transfer partitions where you can submit long-running transfer jobs. When your computation finishes you can remove the copy and transmit or copy results to a permanent location. Please see the Staging Data documentation for more details and examples.","title":"Stage Data"},{"location":"data/hpc-storage/#prevent-large-numbers-of-small-files","text":"The parallel filesystems the clusters use perform poorly with very large numbers of small files. This is one reason we enforce file count quotas. If you are running an application that unavoidably make large numbers of files, do what you can to reduce file creation. Additionally you can reduce load on the filesystem by spreading the files across multiple subdirectories. Delete unneeded files between jobs and compress or archive collections of files.","title":"Prevent Large Numbers of Small Files"},{"location":"data/loomis-decommission/","text":"Loomis Decommission After over eight years in service, the primary storage system on Grace, Loomis (/gpfs/loomis), was retired in December 2022. Since its inception, Loomis doubled in size to host over 2 petabytes of data for more than 600 research groups and almost 4000 individual researchers. The usage and capacity on Loomis has been replaced by two existing YCRC storage systems, Palmer and Gibbs. Unified Storage at the YCRC 2022 saw the introduction of a more unified approach to storage across the YCRC\u2019s clusters. Each group will have one project and one scratch space that is available on all of the HPC clusters (except for Milgram). Project A single project space to host no-cost project-style storage allocations is available on the Gibbs storage system. Purchased allocations are also on Gibbs under the /gpfs/gibbs/pi space of the storage system. Grace users are using this space as of the August 2022 maintenance. Scratch A single scratch space on Palmer, available for Grace users at /vast/palmer/scratch, serves both Grace and McCleary cluster (replacement for Farnam and Ruddle). The Loomis scratch space was decommissioned and purged on October 3, 2022. Software In 2023, a new unified software and module tree was created on Palmer, so the same software will be available for use regardless of which YCRC HPC cluster you are using. We have migrated the software located in /gpfs/loomis/apps/avx to Palmer at /vast/palmer/apps/grace.avx. To continue to support this software without interruption, we are maintaining a symlink at /gpfs/loomis/apps/avx to the new location on Palmer, so software will continue to appear as if it is on Loomis even after the maintenance, despite being hosted on Palmer. In August 2023, Grace was upgraded to Red Hat 8 and this old software tree was deprecated and is no longer supported. What about Existing Data on Loomis? Your Grace home directory was already migrated to Palmer during the January 2022 maintenance. During the Grace Maintenance in August 2022, we migrated all of the Loomis project space ( /gpfs/loomis/project ) to the Gibbs storage system at /gpfs/gibbs/project . You will need to update your scripts and workflows to point to the new location ( /gpfs/gibbs/project// ). The \"project\" symlink in your home directory has been updated to point to your new space (with a few exceptions described below), so scripts using the symlinked path will not need to be updated. If you had a project space that exceeds the no-cost allocation (4TiB), your data was migrated to a new allocation under /gpfs/gibbs/pi . In these instances, your group has been granted a new, empty \"project\" space with the default no-cost quota. Any scripts will need to be updated accordingly. The Loomis scratch space was decommissioned and purged on October 3, 2022. Conda Environments By default, all conda environments are installed into your project directory. However, most conda environments do not survive being moved from one location to another, so you may need to regenerate your conda environment(s). To assist with this, we provide conda-export documentation . R Packages Similarly, in 2022 we started redirecting user R packages to your project space to conserve home directory usage. If your R environment is not working as expected, we recommend deleting the old installation (found in ~/project/R/ ) and rerunning install.packages. Custom Software Installations If you or your group had any self-installed software in the project directory, it is possible that the migration will have broken the software and it will need to be recompiled. Decommission of Old, Deprecated Software Trees As part of the Loomis Decommission, we did not be migrating the old software trees located at /gpfs/loomis/apps/hpc, /gpfs/loomis/apps/hpc.rhel6 and /gpfs/loomis/apps/hpc.rhel7. The deprecated modules can be identified as being prefixed with \"Apps/\", \"GPU/\", \"Libs/\" or \"MPI/\" rather than beginning with the software name. If you are using software modules in one of the old trees, please find an alternative in the current supported tree or reach out to us to install a replacement. Researchers with Purchased Storage on Loomis If you had purchased space that is still active (not expired), we created a new area of the same size for you on Gibbs and transferred your data. If you have purchased storage on /gpfs/loomis that has expired or will be expiring in 2022 and you chose not to renew, any data in that allocation is now retired.","title":"Loomis Decommission"},{"location":"data/loomis-decommission/#loomis-decommission","text":"After over eight years in service, the primary storage system on Grace, Loomis (/gpfs/loomis), was retired in December 2022. Since its inception, Loomis doubled in size to host over 2 petabytes of data for more than 600 research groups and almost 4000 individual researchers. The usage and capacity on Loomis has been replaced by two existing YCRC storage systems, Palmer and Gibbs.","title":"Loomis Decommission"},{"location":"data/loomis-decommission/#unified-storage-at-the-ycrc","text":"2022 saw the introduction of a more unified approach to storage across the YCRC\u2019s clusters. Each group will have one project and one scratch space that is available on all of the HPC clusters (except for Milgram).","title":"Unified Storage at the YCRC"},{"location":"data/loomis-decommission/#project","text":"A single project space to host no-cost project-style storage allocations is available on the Gibbs storage system. Purchased allocations are also on Gibbs under the /gpfs/gibbs/pi space of the storage system. Grace users are using this space as of the August 2022 maintenance.","title":"Project"},{"location":"data/loomis-decommission/#scratch","text":"A single scratch space on Palmer, available for Grace users at /vast/palmer/scratch, serves both Grace and McCleary cluster (replacement for Farnam and Ruddle). The Loomis scratch space was decommissioned and purged on October 3, 2022.","title":"Scratch"},{"location":"data/loomis-decommission/#software","text":"In 2023, a new unified software and module tree was created on Palmer, so the same software will be available for use regardless of which YCRC HPC cluster you are using. We have migrated the software located in /gpfs/loomis/apps/avx to Palmer at /vast/palmer/apps/grace.avx. To continue to support this software without interruption, we are maintaining a symlink at /gpfs/loomis/apps/avx to the new location on Palmer, so software will continue to appear as if it is on Loomis even after the maintenance, despite being hosted on Palmer. In August 2023, Grace was upgraded to Red Hat 8 and this old software tree was deprecated and is no longer supported.","title":"Software"},{"location":"data/loomis-decommission/#what-about-existing-data-on-loomis","text":"Your Grace home directory was already migrated to Palmer during the January 2022 maintenance. During the Grace Maintenance in August 2022, we migrated all of the Loomis project space ( /gpfs/loomis/project ) to the Gibbs storage system at /gpfs/gibbs/project . You will need to update your scripts and workflows to point to the new location ( /gpfs/gibbs/project// ). The \"project\" symlink in your home directory has been updated to point to your new space (with a few exceptions described below), so scripts using the symlinked path will not need to be updated. If you had a project space that exceeds the no-cost allocation (4TiB), your data was migrated to a new allocation under /gpfs/gibbs/pi . In these instances, your group has been granted a new, empty \"project\" space with the default no-cost quota. Any scripts will need to be updated accordingly. The Loomis scratch space was decommissioned and purged on October 3, 2022.","title":"What about Existing Data on Loomis?"},{"location":"data/loomis-decommission/#conda-environments","text":"By default, all conda environments are installed into your project directory. However, most conda environments do not survive being moved from one location to another, so you may need to regenerate your conda environment(s). To assist with this, we provide conda-export documentation .","title":"Conda Environments"},{"location":"data/loomis-decommission/#r-packages","text":"Similarly, in 2022 we started redirecting user R packages to your project space to conserve home directory usage. If your R environment is not working as expected, we recommend deleting the old installation (found in ~/project/R/ ) and rerunning install.packages.","title":"R Packages"},{"location":"data/loomis-decommission/#custom-software-installations","text":"If you or your group had any self-installed software in the project directory, it is possible that the migration will have broken the software and it will need to be recompiled.","title":"Custom Software Installations"},{"location":"data/loomis-decommission/#decommission-of-old-deprecated-software-trees","text":"As part of the Loomis Decommission, we did not be migrating the old software trees located at /gpfs/loomis/apps/hpc, /gpfs/loomis/apps/hpc.rhel6 and /gpfs/loomis/apps/hpc.rhel7. The deprecated modules can be identified as being prefixed with \"Apps/\", \"GPU/\", \"Libs/\" or \"MPI/\" rather than beginning with the software name. If you are using software modules in one of the old trees, please find an alternative in the current supported tree or reach out to us to install a replacement.","title":"Decommission of Old, Deprecated Software Trees"},{"location":"data/loomis-decommission/#researchers-with-purchased-storage-on-loomis","text":"If you had purchased space that is still active (not expired), we created a new area of the same size for you on Gibbs and transferred your data. If you have purchased storage on /gpfs/loomis that has expired or will be expiring in 2022 and you chose not to renew, any data in that allocation is now retired.","title":"Researchers with Purchased Storage on Loomis"},{"location":"data/mccleary-transfer/","text":"Transfer data from Farnam / Ruddle to McCleary In the process of migrating from Farnam/Ruddle to McCleary, we are requesting researchers migrate their own data. Researchers are encouraged to only transfer data which is actively needed and take this opportunity to archive or delete old data. Transfers should be initiated on Ruddle's or McCleary's transfer nodes and sync'd to either Gibbs project directories ( /gpfs/gibbs/project/GROUP/NETID ) or their McCleary home spaces (which are mounted at /vast/palmer/home.mccleary/NETID ). All users are able to log into the transfer nodes via ssh: [ tl397@ruddle1 ~ ] $ ssh transfer [ tl397@transfer-ruddle ~ ] $ Warning Do not attempt to transfer conda environments to McCleary. Environments are not portable and will not work properly if simply copied. Instead, please export and rebuild environments following our guide . The two tools we recommend for this transfer are rsync and Globus . rsync is a command-line utility which copies files, along with their attributes, with protections against file corruption. Globus is a web app where you can schedule large transfers which occur in the background and provide notifications when complete. Since McCleary mounts Farnam and Ruddle's filesystems, these copies are \"local\" copies and should run at high speed. rsync is best suited for smaller data transfers, while Globus is our recommended tool for larger transfers. In this short note we will detail these two approaches. Rsync While rsync is most commonly used for remote transfers between two systems, it is an excellent tool for local work as well. In particular, it's ability to perform tests to make sure that files are transfered properly and to recover from interrupted transfers make it a good option for data migration. There are many configuration possibilities, but we recommend using the following flags: rsync -avP /path/to/existing/data /path/to/new/home/for/data Here the -a will run the transfer in archive mode, which preserves ownership, permissions, and creation/modification times. Additionally, the -v will run in verbose mode where the name of every file is printed out, and -P displays a progress bar. One subtle detail is that rsync changes its behavior based on whether the source path has a trailing / . If one initiates a sync like this: rsync -avP /path/to/existing/data /path/to/new/home/for/data the existing data directory is transferred as a whole entity, including the top-level directory data . However, if the source path includes a trailing / : rsync -avP /path/to/existing/data/ /path/to/new/home/for/data then the contents of data are transferred, omitting the top-level directory. As an example, to transfer a directory (named my_data ) from a YSM project directory on McCleary to your Gibbs project space, you can run: rsync -avP /gpfs/ysm/project/GROUP/NETID/my_data /gpfs/gibbs/project/GROUP/NETID/ Similarly, to transfer a directory ( my_code ) from your YCGA homespace to your new McCleary homespace: rsync -avP /home/NETID/my_code /vast/palmer/home.mccleary/NETID/ where GROUP and NETID are replaced by your specific group/netid. For more detailed information about rsync , please take a look at this nice tutorial ( link ). For rsync transfers that may take a while, it's best to run the transfer inside a tmux virtual login session. This enables you to \"detach\" from the session while the transfer continues in the background. tmux uses special key-strokes to control the session, with the most important being Ctrl-b d (first pressing the control and b keys, releasing, and then pressing d ) which detaches from the current session. To reattach to a detached session, run tmux attach from the same host where tmux was initially started. For more information about tmux , please see their Getting Started Guide . Globus Yale provides dedicated Globus connections for each of the clusters. Transfers can be managed through existing accounts on Ruddle using yale#ruddle , or using McCleary's Globus connection ( Yale CRC McCleary ). For a general getting started with Globus, please check out their website . We have a stand-alone docs page about Globus here , but here we will detail the process to transfer data from YSM (for example) to the Gibbs file system. log in to app.globus.org and use your Yale credentials to authenticate. navigate to the File Manager and access Ruddle or McCleary by searching for the \"collection\" yale#ruddle or Yale CRC McCleary in the left-hand panel. find the files you wish to transfer, using the check-boxes to select any and all files needed. click on the \"Transfer or Sync to\" option and in the right-hand panel also search for the same cluster's collection. navigate through the file-browser to find the desired destination for these data (most likely gibbs_project or a subdirectory). start the transfer, click the \"Start\" button on the left-hand side. This will start a background process to transfer all the selected files and directories to their destination. You will receive an email when the transfer completes detailing the size and average speed of the transferred data. Getting help If you run into any issues or if you would like help in setting up your data migration, please feel free to reach out to hpc@yale.edu to request one-on-one support.","title":"Transfer data from Farnam / Ruddle to McCleary"},{"location":"data/mccleary-transfer/#transfer-data-from-farnam-ruddle-to-mccleary","text":"In the process of migrating from Farnam/Ruddle to McCleary, we are requesting researchers migrate their own data. Researchers are encouraged to only transfer data which is actively needed and take this opportunity to archive or delete old data. Transfers should be initiated on Ruddle's or McCleary's transfer nodes and sync'd to either Gibbs project directories ( /gpfs/gibbs/project/GROUP/NETID ) or their McCleary home spaces (which are mounted at /vast/palmer/home.mccleary/NETID ). All users are able to log into the transfer nodes via ssh: [ tl397@ruddle1 ~ ] $ ssh transfer [ tl397@transfer-ruddle ~ ] $ Warning Do not attempt to transfer conda environments to McCleary. Environments are not portable and will not work properly if simply copied. Instead, please export and rebuild environments following our guide . The two tools we recommend for this transfer are rsync and Globus . rsync is a command-line utility which copies files, along with their attributes, with protections against file corruption. Globus is a web app where you can schedule large transfers which occur in the background and provide notifications when complete. Since McCleary mounts Farnam and Ruddle's filesystems, these copies are \"local\" copies and should run at high speed. rsync is best suited for smaller data transfers, while Globus is our recommended tool for larger transfers. In this short note we will detail these two approaches.","title":"Transfer data from Farnam / Ruddle to McCleary"},{"location":"data/mccleary-transfer/#rsync","text":"While rsync is most commonly used for remote transfers between two systems, it is an excellent tool for local work as well. In particular, it's ability to perform tests to make sure that files are transfered properly and to recover from interrupted transfers make it a good option for data migration. There are many configuration possibilities, but we recommend using the following flags: rsync -avP /path/to/existing/data /path/to/new/home/for/data Here the -a will run the transfer in archive mode, which preserves ownership, permissions, and creation/modification times. Additionally, the -v will run in verbose mode where the name of every file is printed out, and -P displays a progress bar. One subtle detail is that rsync changes its behavior based on whether the source path has a trailing / . If one initiates a sync like this: rsync -avP /path/to/existing/data /path/to/new/home/for/data the existing data directory is transferred as a whole entity, including the top-level directory data . However, if the source path includes a trailing / : rsync -avP /path/to/existing/data/ /path/to/new/home/for/data then the contents of data are transferred, omitting the top-level directory. As an example, to transfer a directory (named my_data ) from a YSM project directory on McCleary to your Gibbs project space, you can run: rsync -avP /gpfs/ysm/project/GROUP/NETID/my_data /gpfs/gibbs/project/GROUP/NETID/ Similarly, to transfer a directory ( my_code ) from your YCGA homespace to your new McCleary homespace: rsync -avP /home/NETID/my_code /vast/palmer/home.mccleary/NETID/ where GROUP and NETID are replaced by your specific group/netid. For more detailed information about rsync , please take a look at this nice tutorial ( link ). For rsync transfers that may take a while, it's best to run the transfer inside a tmux virtual login session. This enables you to \"detach\" from the session while the transfer continues in the background. tmux uses special key-strokes to control the session, with the most important being Ctrl-b d (first pressing the control and b keys, releasing, and then pressing d ) which detaches from the current session. To reattach to a detached session, run tmux attach from the same host where tmux was initially started. For more information about tmux , please see their Getting Started Guide .","title":"Rsync"},{"location":"data/mccleary-transfer/#globus","text":"Yale provides dedicated Globus connections for each of the clusters. Transfers can be managed through existing accounts on Ruddle using yale#ruddle , or using McCleary's Globus connection ( Yale CRC McCleary ). For a general getting started with Globus, please check out their website . We have a stand-alone docs page about Globus here , but here we will detail the process to transfer data from YSM (for example) to the Gibbs file system. log in to app.globus.org and use your Yale credentials to authenticate. navigate to the File Manager and access Ruddle or McCleary by searching for the \"collection\" yale#ruddle or Yale CRC McCleary in the left-hand panel. find the files you wish to transfer, using the check-boxes to select any and all files needed. click on the \"Transfer or Sync to\" option and in the right-hand panel also search for the same cluster's collection. navigate through the file-browser to find the desired destination for these data (most likely gibbs_project or a subdirectory). start the transfer, click the \"Start\" button on the left-hand side. This will start a background process to transfer all the selected files and directories to their destination. You will receive an email when the transfer completes detailing the size and average speed of the transferred data.","title":"Globus"},{"location":"data/mccleary-transfer/#getting-help","text":"If you run into any issues or if you would like help in setting up your data migration, please feel free to reach out to hpc@yale.edu to request one-on-one support.","title":"Getting help"},{"location":"data/permissions/","text":"Share with Cluster Users Home Directories Do not give your home directory group write permissions. This will break your ability to log into the cluster. If you need to share files currently located in your home directory, either move it your project directory or contact us for assistance finding an appropriate location. project and scratch60 links in Home Directories For convenience, we create a symlink, or shortcut, in every home directory called project and palmer_scratch (and ~/scratch60 on Milgram ) that go to your respective storage spaces . However, if another user attempts to access any data via your symlink, they will receive errors related to permissions for your home space. You can run mydirectories or readlink - f dirname (replace dirname with the one you are interested in) to get the \"true\" paths, which is more readily accesible to other users. Share Data within your Group By default, all project, purchased allocation and scratch directories are readable by other members of your group. As long as they use the true path (not the shortcut your home directory, see above), no permission changes should be needed. If you want to ensure all new files and directories you create have group write permission, add the following line to your ~/.bashrc files: umask 002 Shared Group Directories Upon request we can setup directories for sharing scripts or data across your research group. These directories can either have read-only permissions for the group (so no one accidentally modifies something) or read and write permissions for all group members. If interested, contact us to request such a directory. Share With Specific Users or Other Groups It can be very useful to create shared directories that can be read and written by multiple users, or all members of a group. The linux command setfacl is useful for this, but can be complicated to use. We recommend that you create a shared directory somewhere in your project or scratch directories, rather than home . When sharing a sub-directory in your project or scratch , you need first share your project or scratch , and then share the sub-directory. Here are some simple scenarios. Share a Directory with All Members of a Group To share a new directory called shared in your project directory with group othergroup : setfacl -m g:othergroup:rx $(readlink -f ~/project) cd ~/project mkdir shared setfacl -m g:othergroup:rwX shared setfacl -d -m g:othergroup:rwX shared Share a Directory with a Particular Person To share a new directory called shared with a person with netid aa111 : setfacl -m u:aa111:rx $(readlink -f ~/project) cd ~/project mkdir shared setfacl -m u:aa111:rwX shared setfacl -d -m u:aa111:rwX shared If the shared directory already exists and contains files and directories, you should run the setfacl commands recursively, using -R: setfacl -R -m u:aa111:rwX shared setfacl -R -d -m u:aa111:rwX shared Note that only the owner of a file or directory can run setfacl on it. Remove Sharing of a Directory To remove a group othergroup from sharing of a directory called shared : setfacl -R -x g:othergroup shared To remove a person with netid aa111 from sharing of a directory called shared : setfacl -R -x u:aa111 shared","title":"Share with Cluster Users"},{"location":"data/permissions/#share-with-cluster-users","text":"","title":"Share with Cluster Users"},{"location":"data/permissions/#home-directories","text":"Do not give your home directory group write permissions. This will break your ability to log into the cluster. If you need to share files currently located in your home directory, either move it your project directory or contact us for assistance finding an appropriate location.","title":"Home Directories"},{"location":"data/permissions/#project-and-scratch60-links-in-home-directories","text":"For convenience, we create a symlink, or shortcut, in every home directory called project and palmer_scratch (and ~/scratch60 on Milgram ) that go to your respective storage spaces . However, if another user attempts to access any data via your symlink, they will receive errors related to permissions for your home space. You can run mydirectories or readlink - f dirname (replace dirname with the one you are interested in) to get the \"true\" paths, which is more readily accesible to other users.","title":"project and scratch60 links in Home Directories"},{"location":"data/permissions/#share-data-within-your-group","text":"By default, all project, purchased allocation and scratch directories are readable by other members of your group. As long as they use the true path (not the shortcut your home directory, see above), no permission changes should be needed. If you want to ensure all new files and directories you create have group write permission, add the following line to your ~/.bashrc files: umask 002","title":"Share Data within your Group"},{"location":"data/permissions/#shared-group-directories","text":"Upon request we can setup directories for sharing scripts or data across your research group. These directories can either have read-only permissions for the group (so no one accidentally modifies something) or read and write permissions for all group members. If interested, contact us to request such a directory.","title":"Shared Group Directories"},{"location":"data/permissions/#share-with-specific-users-or-other-groups","text":"It can be very useful to create shared directories that can be read and written by multiple users, or all members of a group. The linux command setfacl is useful for this, but can be complicated to use. We recommend that you create a shared directory somewhere in your project or scratch directories, rather than home . When sharing a sub-directory in your project or scratch , you need first share your project or scratch , and then share the sub-directory. Here are some simple scenarios.","title":"Share With Specific Users or Other Groups"},{"location":"data/permissions/#share-a-directory-with-all-members-of-a-group","text":"To share a new directory called shared in your project directory with group othergroup : setfacl -m g:othergroup:rx $(readlink -f ~/project) cd ~/project mkdir shared setfacl -m g:othergroup:rwX shared setfacl -d -m g:othergroup:rwX shared","title":"Share a Directory with All Members of a Group"},{"location":"data/permissions/#share-a-directory-with-a-particular-person","text":"To share a new directory called shared with a person with netid aa111 : setfacl -m u:aa111:rx $(readlink -f ~/project) cd ~/project mkdir shared setfacl -m u:aa111:rwX shared setfacl -d -m u:aa111:rwX shared If the shared directory already exists and contains files and directories, you should run the setfacl commands recursively, using -R: setfacl -R -m u:aa111:rwX shared setfacl -R -d -m u:aa111:rwX shared Note that only the owner of a file or directory can run setfacl on it.","title":"Share a Directory with a Particular Person"},{"location":"data/permissions/#remove-sharing-of-a-directory","text":"To remove a group othergroup from sharing of a directory called shared : setfacl -R -x g:othergroup shared To remove a person with netid aa111 from sharing of a directory called shared : setfacl -R -x u:aa111 shared","title":"Remove Sharing of a Directory"},{"location":"data/staging/","text":"Stage Data for Compute Jobs Large datasets are often stored off-cluster on departmental servers, Storage@Yale, in cloud storage, etc. Since the permanent home of the data remains on off-cluster storage, you need to transfer a working copy to the cluster temporarily. When your computation finishes, you would then remove the copy and transfer the results to a more permanent location. Temporary Storage We recommend staging data into your scratch storage space on the cluster, as the working copy of the data can then be removed manually or left to be deleted (which will happen automatically after 60-days). Interactive Transfers For interactive transfers, please see our Transfer Data page for a more complete list of ways to move data efficiently to and from the clusters. A sample workflow using rsync would be: # connect to the transfer node from the login node [ netID@cluster ~ ] ssh transfer # copy data to temporary cluster storage [ netID@transfer ~ ] $ rsync -avP netID@department_server:/path/to/data $HOME /palmer_scratch/ # process data on cluster [ netID@transfer ~ ] $ sbatch data_processing.sh # return results to permanent storage for safe-keeping [ netID@transfer ~ ] $ rsync -avP $HOME /palmer_scratch/output_data netID@department_server:/path/to/outputs/ Tip To protect your transfer from network interruptions between your computer and the transfer node, launch your rsync inside a tmux session on the transfer node. Transfer Partition Both Grace and McCleary have dedicated data transfer partitions (named transfer ) designed for staging data onto the cluster. All users are able to submit jobs to these partitions. Note each users is limited to running two transfer jobs at one time. If your workflow requires more simultaneuous transfers, contact us for assistance. Transfers as Batch Jobs A sample sbatch script for an rsync transfer is show here: #!/bin/bash #SBATCH --partition=transfer #SBATCH --time=6:00:00 #SBATCH --cpus-per-task=1 #SBATCH --job-name=my_transfer #SBATCH --output=transfer.txt rsync -av netID@department_server:/path/to/data $HOME /palmer_scratch/ This will launch a batch job that will transfer data from remote.host.yale.edu to your scratch directory. Note, this will only work if you have set up password-less logins on the remote host. Transfer Job Dependencies There are sbatch options that allow you to hold a job from running until a previous job finishes. These are called Job Dependencies, and they allow you to include a data-staging step as part of your data processing pipe-line. Consider a workflow where we would like to process data located on a remote server. We can break this into two separate Slurm jobs: a transfer job followed by a processing job. transfer.sbatch #!/bin/bash #SBATCH --partition=transfer #SBATCH --time=6:00:00 #SBATCH --cpus-per-task=1 #SBATCH --job-name=my_transfer rsync -av netID@department_server:/path/to/data $HOME /palmer_scratch/ process.sbatch #!/bin/bash #SBATCH --partition=day #SBATCH --time=6:00:00 #SBATCH --cpus-per-task=1 #SBATCH --job-name=my_process module purge module load miniconda conda activate my_env python $HOME /process_script.py $HOME /palmer_scratch/data First we would submit the transfer job to Slurm: $ sbatch transfer.sbatch Submitted batch job 12345678 Then we can pass this jobID as a dependency for the processing job: $ sbatch --dependency = afterok:12345678 process.sbatch Submitted batch job 12345679 Slurm will now hold the processing job until the transfer finishes: $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST ( REASON ) 12345679 day process netID PD 0 :00 1 ( Dependency ) 12345678 transfer transfer netID R 0 :15 1 c01n04 Storage@Yale Transfers Storage@Yale shares are mounted on the transfer partition, enabling you to stage data from these remote servers. The process is somewhat simpler than the above example because we do not need to rsync the data, and can instead use cp directly. Here, we have modified the transfer.sbatch file from above: transfer.sbatch #!/bin/bash #SBATCH --partition=transfer #SBATCH --time=6:00:00 #SBATCH --cpus-per-task=1 #SBATCH --job-name=my_transfer cp /SAY/standard/my_say_share/data $HOME /palmer_scratch/ This will transfer data from the Storage@Yale share to palmer_scratch where it can be processed on any of the compute nodes.","title":"Stage Data for Compute Jobs"},{"location":"data/staging/#stage-data-for-compute-jobs","text":"Large datasets are often stored off-cluster on departmental servers, Storage@Yale, in cloud storage, etc. Since the permanent home of the data remains on off-cluster storage, you need to transfer a working copy to the cluster temporarily. When your computation finishes, you would then remove the copy and transfer the results to a more permanent location.","title":"Stage Data for Compute Jobs"},{"location":"data/staging/#temporary-storage","text":"We recommend staging data into your scratch storage space on the cluster, as the working copy of the data can then be removed manually or left to be deleted (which will happen automatically after 60-days).","title":"Temporary Storage"},{"location":"data/staging/#interactive-transfers","text":"For interactive transfers, please see our Transfer Data page for a more complete list of ways to move data efficiently to and from the clusters. A sample workflow using rsync would be: # connect to the transfer node from the login node [ netID@cluster ~ ] ssh transfer # copy data to temporary cluster storage [ netID@transfer ~ ] $ rsync -avP netID@department_server:/path/to/data $HOME /palmer_scratch/ # process data on cluster [ netID@transfer ~ ] $ sbatch data_processing.sh # return results to permanent storage for safe-keeping [ netID@transfer ~ ] $ rsync -avP $HOME /palmer_scratch/output_data netID@department_server:/path/to/outputs/ Tip To protect your transfer from network interruptions between your computer and the transfer node, launch your rsync inside a tmux session on the transfer node.","title":"Interactive Transfers"},{"location":"data/staging/#transfer-partition","text":"Both Grace and McCleary have dedicated data transfer partitions (named transfer ) designed for staging data onto the cluster. All users are able to submit jobs to these partitions. Note each users is limited to running two transfer jobs at one time. If your workflow requires more simultaneuous transfers, contact us for assistance.","title":"Transfer Partition"},{"location":"data/staging/#transfers-as-batch-jobs","text":"A sample sbatch script for an rsync transfer is show here: #!/bin/bash #SBATCH --partition=transfer #SBATCH --time=6:00:00 #SBATCH --cpus-per-task=1 #SBATCH --job-name=my_transfer #SBATCH --output=transfer.txt rsync -av netID@department_server:/path/to/data $HOME /palmer_scratch/ This will launch a batch job that will transfer data from remote.host.yale.edu to your scratch directory. Note, this will only work if you have set up password-less logins on the remote host.","title":"Transfers as Batch Jobs"},{"location":"data/staging/#transfer-job-dependencies","text":"There are sbatch options that allow you to hold a job from running until a previous job finishes. These are called Job Dependencies, and they allow you to include a data-staging step as part of your data processing pipe-line. Consider a workflow where we would like to process data located on a remote server. We can break this into two separate Slurm jobs: a transfer job followed by a processing job.","title":"Transfer Job Dependencies"},{"location":"data/staging/#transfersbatch","text":"#!/bin/bash #SBATCH --partition=transfer #SBATCH --time=6:00:00 #SBATCH --cpus-per-task=1 #SBATCH --job-name=my_transfer rsync -av netID@department_server:/path/to/data $HOME /palmer_scratch/","title":"transfer.sbatch"},{"location":"data/staging/#processsbatch","text":"#!/bin/bash #SBATCH --partition=day #SBATCH --time=6:00:00 #SBATCH --cpus-per-task=1 #SBATCH --job-name=my_process module purge module load miniconda conda activate my_env python $HOME /process_script.py $HOME /palmer_scratch/data First we would submit the transfer job to Slurm: $ sbatch transfer.sbatch Submitted batch job 12345678 Then we can pass this jobID as a dependency for the processing job: $ sbatch --dependency = afterok:12345678 process.sbatch Submitted batch job 12345679 Slurm will now hold the processing job until the transfer finishes: $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST ( REASON ) 12345679 day process netID PD 0 :00 1 ( Dependency ) 12345678 transfer transfer netID R 0 :15 1 c01n04","title":"process.sbatch"},{"location":"data/staging/#storageyale-transfers","text":"Storage@Yale shares are mounted on the transfer partition, enabling you to stage data from these remote servers. The process is somewhat simpler than the above example because we do not need to rsync the data, and can instead use cp directly. Here, we have modified the transfer.sbatch file from above:","title":"Storage@Yale Transfers"},{"location":"data/staging/#transfersbatch_1","text":"#!/bin/bash #SBATCH --partition=transfer #SBATCH --time=6:00:00 #SBATCH --cpus-per-task=1 #SBATCH --job-name=my_transfer cp /SAY/standard/my_say_share/data $HOME /palmer_scratch/ This will transfer data from the Storage@Yale share to palmer_scratch where it can be processed on any of the compute nodes.","title":"transfer.sbatch"},{"location":"data/transfer/","text":"Transfer Data For all transfer methods, you need to have set up your account on the cluster(s) you want to tranfer data to/from. Data Transfer Nodes Each cluster has dedicated nodes specially networked for high speed transfers both on and off-campus using the Yale Science Network. You may use transfer nodes to transfer data from your local machine using one of the below methods. From off-cluster, the nodes are accessible at the following hostnames. You must still be on-campus or on the VPN to access the transfer nodes. Cluster Transfer Node Grace transfer-grace.ycrc.yale.edu McCleary transfer-mccleary.ycrc.yale.edu Milgram transfer-milgram.ycrc.yale.edu From the login node of any cluster, you can ssh into the transfer node. This is useful for transferring data to or from locations other than your local machine (see below for details). [netID@cluster ~] ssh transfer Transferring Data to/from Your Local Machine Graphical Transfer Tools OOD Web Transfers On each cluster, you can use their respective Open OnDemand portals to transfer files. This works best for small numbers of relatively small files. You can also directly edit scripts through this interface, alleviating the need to transfer scripts to your computer to edit. MobaXterm (Windows) MobaXterm is an all-in-one graphical client for Windows that includes a transfer pane for each cluster you connect to. Once you have established a connection to the cluster, click on the \"Sftp\" tab in the left sidebar to see your files on the cluster. You can drag-and-drop data into and out of the SFTP pane to upload and download, respectively. Cyberduck You can also transfer files between your local computer and a cluster using an FTP client, such as Cyberduck (OSX/Windows) . You will need to configure the client with: Your netid as the \"Username\" Cluster transfer node (see above) as the \"Server\" Select your private key as the \"SSH Private Key\" Leave \"Password\" blank (you will be prompted on connection for your ssh key passphrase) An example configuration of Cyberduck is shown below. Cyberduck on McCleary and Milgram McCleary and Milgram require Multi-Factor Authentication so there are a couple additional configuration steps. Under Cyberduck > Preferences > Transfers > General change the setting to \"Use browser connection\" instead of \"Open multiple connections\". When you connect type one of the following when prompted with a \"Partial authentication success\" window. \"push\" to receive a push notification to your smart phone (requires the Duo mobile app) \"sms\" to receive a verification passcode via text message \"phone\" to receive a phone call Large File Transfers (Globus) You can use the Globus service to perform larger data transfers between your local machine and the clusters. Globus provides a robust and resumable way to transfer larger files or datasets. Please see our Globus page for Yale-specific documentation and their official docs to get started. Command-Line Transfer Tools scp and rsync (macOS/Linux/Linux on Windows) Linux and macOS users can use scp or rsync . Use the hostname of the cluster transfer node (see above) to transfer files. These transfers must be initiated from your local machine. scp and sftp are both used from a Terminal window. The basic syntax of scp is scp [ from ] [ to ] The from and to can each be a filename or a directory/folder on the computer you are typing the command on or a remote host (e.g. the transfer node). Example: Transfer a File from Your Computer to a Cluster Using the example netid abc123 , following is run on your computer's local terminal. scp myfile.txt abc123@transfer-grace.ycrc.yale.edu:/home/fas/admins/abc123/test In this example, myfile.txt is copied to the directory /home/fas/admins/abc123/test: on Grace. This example assumes that myfile.txt is in your current directory. You may also specify the full path of myfile.txt . scp /home/xyz/myfile.txt abc123@transfer-grace.ycrc.yale.edu:/home/fas/admins/abc123/test Example: Transfer a Directory to a Cluster scp -r mydirectory abc123@transfer-grace.ycrc.yale.edu:/home/fas/admins/abc123/test In this example, the contents of mydirectory are transferred. The -r indicates that the copy is recursive. Example: Transfer Files from the Cluster to Your Computer Assuming you would like the files copied to your current directory: scp abc123@transfer-grace.ycrc.yale.edu:/home/fas/admins/abc123/myfile.txt . Note that . represents your current working directory. To specify the destination, simply replace the . with the full path: scp abc123@transfer-grace.ycrc.yale.edu:/home/fas/admins/abc123/myfile.txt /path/myfolder Transfer Data to/from Other Locations Globus Endpoints Globus is a web-enabled GridFTP service that transfers large datasets fast, securely, and reliably between computers configured to be endpoints. Please see our Globus page for Yale-specific documentation and their official docs to get started. We have configured endpoints for most of the Yale clusters and many other institutions and compute facilities have Globus endpoints. You can also use Globus to transfer data to/from Eliapps Google Drive and S3 buckets. Cluster Transfer Nodes You can use the cluster transfer nodes to download/upload data to locations off-cluster. For data that is primarily hosted elsewhere and is only needed on the cluster temporarily, see our guide on Staging Data for additional information. For any data that hosted outside of Yale, you will need to initiate the transfer from the cluster's data transfer node as the clusters are not accessible without the VPN. On Milgram, which does not have a transfer node, you can initiate the transfers from a login node. However, please be mindful of that other users will also be using the login nodes for regular cluster operations. Tip If you are running a large transfer without Globus , run it inside a tmux session on the transfer node. This protects your transfer from network interruptions between your computer and the transfer node. rsync # connect to the transfer node from the login node [netID@cluster ~] ssh transfer # copy data to cluster storage [netID@transfer ~]$ rsync -avP netID@department_server:/path/to/data $HOME/scratch60/ Rclone To move data to and from cloud storage (Box, Dropbox, Wasabi, AWS S3, or Google Cloud Storage, etc.), we recommend using Rclone . It is installed on all of the clusters and can be installed on your computer. You will need to configure it for each kind of storage you would like to transfer to with: rclone configure You'll be prompted for a name for the connection (e.g mys3), and then details about the connection. Once you've saved that configuration, you can connect to the transfer node (using ssh transfer from the login node) and then use that connection name to copy files with similar syntax to scp and rsync : rclone copy localpath/myfile mys3:bucketname/ rclone sync localpath/mydir mys3:bucketname/remotedir We recommend that you protect your configurations with a password. You'll see that as an option when you run rclone config. Please see our Rclone page for additional information on how to set up and use Rclone on the YCRC clusters. For all the Rclone documentaion please refer to the official site . Sites Behind a VPN If you need to transfer data to or from an external site that is only accessible via VPN, please contact us for assistance as we might be able to provide a workaround to enable a direct transfer between the YCRC clusters and your external site.","title":"Transfer to Cluster"},{"location":"data/transfer/#transfer-data","text":"For all transfer methods, you need to have set up your account on the cluster(s) you want to tranfer data to/from.","title":"Transfer Data"},{"location":"data/transfer/#data-transfer-nodes","text":"Each cluster has dedicated nodes specially networked for high speed transfers both on and off-campus using the Yale Science Network. You may use transfer nodes to transfer data from your local machine using one of the below methods. From off-cluster, the nodes are accessible at the following hostnames. You must still be on-campus or on the VPN to access the transfer nodes. Cluster Transfer Node Grace transfer-grace.ycrc.yale.edu McCleary transfer-mccleary.ycrc.yale.edu Milgram transfer-milgram.ycrc.yale.edu From the login node of any cluster, you can ssh into the transfer node. This is useful for transferring data to or from locations other than your local machine (see below for details). [netID@cluster ~] ssh transfer","title":"Data Transfer Nodes"},{"location":"data/transfer/#transferring-data-tofrom-your-local-machine","text":"","title":"Transferring Data to/from Your Local Machine"},{"location":"data/transfer/#graphical-transfer-tools","text":"","title":"Graphical Transfer Tools"},{"location":"data/transfer/#ood-web-transfers","text":"On each cluster, you can use their respective Open OnDemand portals to transfer files. This works best for small numbers of relatively small files. You can also directly edit scripts through this interface, alleviating the need to transfer scripts to your computer to edit.","title":"OOD Web Transfers"},{"location":"data/transfer/#mobaxterm-windows","text":"MobaXterm is an all-in-one graphical client for Windows that includes a transfer pane for each cluster you connect to. Once you have established a connection to the cluster, click on the \"Sftp\" tab in the left sidebar to see your files on the cluster. You can drag-and-drop data into and out of the SFTP pane to upload and download, respectively.","title":"MobaXterm (Windows)"},{"location":"data/transfer/#cyberduck","text":"You can also transfer files between your local computer and a cluster using an FTP client, such as Cyberduck (OSX/Windows) . You will need to configure the client with: Your netid as the \"Username\" Cluster transfer node (see above) as the \"Server\" Select your private key as the \"SSH Private Key\" Leave \"Password\" blank (you will be prompted on connection for your ssh key passphrase) An example configuration of Cyberduck is shown below.","title":"Cyberduck"},{"location":"data/transfer/#cyberduck-on-mccleary-and-milgram","text":"McCleary and Milgram require Multi-Factor Authentication so there are a couple additional configuration steps. Under Cyberduck > Preferences > Transfers > General change the setting to \"Use browser connection\" instead of \"Open multiple connections\". When you connect type one of the following when prompted with a \"Partial authentication success\" window. \"push\" to receive a push notification to your smart phone (requires the Duo mobile app) \"sms\" to receive a verification passcode via text message \"phone\" to receive a phone call","title":"Cyberduck on McCleary and Milgram"},{"location":"data/transfer/#large-file-transfers-globus","text":"You can use the Globus service to perform larger data transfers between your local machine and the clusters. Globus provides a robust and resumable way to transfer larger files or datasets. Please see our Globus page for Yale-specific documentation and their official docs to get started.","title":"Large File Transfers (Globus)"},{"location":"data/transfer/#command-line-transfer-tools","text":"","title":"Command-Line Transfer Tools"},{"location":"data/transfer/#scp-and-rsync-macoslinuxlinux-on-windows","text":"Linux and macOS users can use scp or rsync . Use the hostname of the cluster transfer node (see above) to transfer files. These transfers must be initiated from your local machine. scp and sftp are both used from a Terminal window. The basic syntax of scp is scp [ from ] [ to ] The from and to can each be a filename or a directory/folder on the computer you are typing the command on or a remote host (e.g. the transfer node).","title":"scp and rsync (macOS/Linux/Linux on Windows)"},{"location":"data/transfer/#example-transfer-a-file-from-your-computer-to-a-cluster","text":"Using the example netid abc123 , following is run on your computer's local terminal. scp myfile.txt abc123@transfer-grace.ycrc.yale.edu:/home/fas/admins/abc123/test In this example, myfile.txt is copied to the directory /home/fas/admins/abc123/test: on Grace. This example assumes that myfile.txt is in your current directory. You may also specify the full path of myfile.txt . scp /home/xyz/myfile.txt abc123@transfer-grace.ycrc.yale.edu:/home/fas/admins/abc123/test","title":"Example: Transfer a File from Your Computer to a Cluster"},{"location":"data/transfer/#example-transfer-a-directory-to-a-cluster","text":"scp -r mydirectory abc123@transfer-grace.ycrc.yale.edu:/home/fas/admins/abc123/test In this example, the contents of mydirectory are transferred. The -r indicates that the copy is recursive.","title":"Example: Transfer a Directory to a Cluster"},{"location":"data/transfer/#example-transfer-files-from-the-cluster-to-your-computer","text":"Assuming you would like the files copied to your current directory: scp abc123@transfer-grace.ycrc.yale.edu:/home/fas/admins/abc123/myfile.txt . Note that . represents your current working directory. To specify the destination, simply replace the . with the full path: scp abc123@transfer-grace.ycrc.yale.edu:/home/fas/admins/abc123/myfile.txt /path/myfolder","title":"Example: Transfer Files from the Cluster to Your Computer"},{"location":"data/transfer/#transfer-data-tofrom-other-locations","text":"","title":"Transfer Data to/from Other Locations"},{"location":"data/transfer/#globus-endpoints","text":"Globus is a web-enabled GridFTP service that transfers large datasets fast, securely, and reliably between computers configured to be endpoints. Please see our Globus page for Yale-specific documentation and their official docs to get started. We have configured endpoints for most of the Yale clusters and many other institutions and compute facilities have Globus endpoints. You can also use Globus to transfer data to/from Eliapps Google Drive and S3 buckets.","title":"Globus Endpoints"},{"location":"data/transfer/#cluster-transfer-nodes","text":"You can use the cluster transfer nodes to download/upload data to locations off-cluster. For data that is primarily hosted elsewhere and is only needed on the cluster temporarily, see our guide on Staging Data for additional information. For any data that hosted outside of Yale, you will need to initiate the transfer from the cluster's data transfer node as the clusters are not accessible without the VPN. On Milgram, which does not have a transfer node, you can initiate the transfers from a login node. However, please be mindful of that other users will also be using the login nodes for regular cluster operations. Tip If you are running a large transfer without Globus , run it inside a tmux session on the transfer node. This protects your transfer from network interruptions between your computer and the transfer node.","title":"Cluster Transfer Nodes"},{"location":"data/transfer/#rsync","text":"# connect to the transfer node from the login node [netID@cluster ~] ssh transfer # copy data to cluster storage [netID@transfer ~]$ rsync -avP netID@department_server:/path/to/data $HOME/scratch60/","title":"rsync"},{"location":"data/transfer/#rclone","text":"To move data to and from cloud storage (Box, Dropbox, Wasabi, AWS S3, or Google Cloud Storage, etc.), we recommend using Rclone . It is installed on all of the clusters and can be installed on your computer. You will need to configure it for each kind of storage you would like to transfer to with: rclone configure You'll be prompted for a name for the connection (e.g mys3), and then details about the connection. Once you've saved that configuration, you can connect to the transfer node (using ssh transfer from the login node) and then use that connection name to copy files with similar syntax to scp and rsync : rclone copy localpath/myfile mys3:bucketname/ rclone sync localpath/mydir mys3:bucketname/remotedir We recommend that you protect your configurations with a password. You'll see that as an option when you run rclone config. Please see our Rclone page for additional information on how to set up and use Rclone on the YCRC clusters. For all the Rclone documentaion please refer to the official site .","title":"Rclone"},{"location":"data/transfer/#sites-behind-a-vpn","text":"If you need to transfer data to or from an external site that is only accessible via VPN, please contact us for assistance as we might be able to provide a workaround to enable a direct transfer between the YCRC clusters and your external site.","title":"Sites Behind a VPN"},{"location":"data/ycga-data/","text":"YCGA Data Data associated with YCGA projects and sequencers are located on the YCGA storage system, accessible at /gpfs/ycga/sequencers on McCleary . YCGA Access Retention Policy The McCleary high-performance computing system has specific resources that are dedicated to YCGA users. This includes a slurm partition (\u2018ycga\u2019) and a large parallel storage system (/gpfs/ycga). The following policy guidelines govern the use of these resources on McCleary for data storage and analysis. Yale University Faculty User All Yale PIs using YCGA for library preparation and/or sequencing will have an additional 5 TB storage area called \u2018work\u2019 for data storage. This is in addition to the 5 TB storage area called \u2018project\u2019 that all McCleary groups receive. Currently, neither work or project storage is backed up. Users are responsible for protecting their own data. All Fastq files are available on the /gpfs/ycga storage system for one year. After that, the files are available in an archive that allows self-service retrieval, as described in the link above. Issues or questions about archived data can be addressed to ycga@yale.edu. Users processing sequence data on McCleary should be careful to submit their jobs to the \u2018ycga\u2019 partition. Jobs submitted to other partitions may incur additional charges. Members of Yale PI labs using YCGA for library preparation and/or sequencing may apply for accounts on McCleary with PI\u2019s approval. Each Yale PI lab will have a dedicated secure directory to store their data, and permission to lab members will be granted with the authorization of the respective PI. Furthermore, such approval will be terminated upon request from the PI or termination of Yale Net ID. Lab members moving to a new university will get access to HPC resources for an additional six months only upon permission from Yale PI. If Yale NetID is no longer accessible, former Yale members who were YCGA users should request a Sponsored Identity NetID from their business office. Sponsored Identity NetIDs will be valid for six months. Such users will also need to request VPN access. A PI moving to a new university to establish their lab will have access to their data for one year from the termination of their Yale position. During this time, the PI or one lab member from the new lab will be provided access to the HPC system. Request for Guest NetID should be made to their business office. Guest NetID will be valid for one year. Any new Yale faculty member will be given access to McCleary once they start using YCGA services. Users not utilizing the YCGA services will not be provided access to McCleary high- performance computing system. External Collaborators Access to McCleary can be granted to collaborating labs, with the authorization of the respective Yale PI. A maximum of one account per collaborating lab will be granted. Furthermore, such approval will be terminated upon request from the PI. Request for a Sponsored Identity NetID should be made to the Yale PI\u2019s business office. Guest NetID will be valid for one year. The expectation is that the collaborator, with PI consent, will download data from the McCleary HPC system to their own internal system for data analysis. Non-Yale Users Users not affiliated with Yale University will not be provided access to McCleary high- performance computing system. YCGA Data Retention Policy Illumina sequence data is initially written to YCGA's main storage system, which is located in the main HPC datacenter at Yale's West Campus. Data stored there is protected against loss by software RAID. Raw basecall data (bcl files) is immediately transformed into DNA sequences (fastq files). ~45 days after sequencing, the raw bcl files are deleted. ~60 days after sequencing, the fastq files are written to an archive. This archive exists in two geographically distinct copies for safety. ~365 days after sequencing, all data is deleted from main storage. Users continue to have access to the data via the archive. Data is retained on the archive indefinitely. See below for instructions for retrieving archived data. All compression of sequence data is lossless. Gzip is used for data stored on the main storage, and quip is used for data stored on the archive. Disaster recovery is provided by the archive copy. YCGA will send you an email informing you that your data is ready, and will include a url that looks like: http://fcb.ycga.yale.edu:3010/ randomstring /sample_dir_001 You can use that link to download your data in a browser, but if you plan to process the data on McCleary, it is better to make a soft link to the data, rather than copying it. To find the actual location of your data, do: $ readlink -f /ycga-gpfs/project/fas/lsprog/tools/external/data/randomstring/sample_dir_001 Illumina sequencing data For Illumina data (not singlecell or pacbio data), you can browse to the YCGA-provided URL and find a file ruddle_paths.txt that contains the true locations of the files. Alternatively, you can use the ycgaFastq tool to easily make soft links to the sequencing files: export PATH = $PATH :/gpfs/gibbs/pi/ycga/mane/ycga_bioinfo/bin_May2023 $ ycgaFastq fcb.ycga.yale.edu:3010/randomstring/sample_dir_001 ycgaFastq can also be used to retrieve data that has been archived. The simplest way to do that is to provide the sample submitter's netid and the flowcell (run) name: $ ycgaFastq rdb9 AHFH66DSXX If you have a path to the original location of the sequencing data, ycgaFastq can retrieve the data using that, even if the run has been archived and deleted: $ ycgaFastq /ycga-gpfs/sequencers/illumina/sequencerD/runs/190607_A00124_0104_AHLF3MMSXX/Data/Intensities/BaseCalls/Unaligned-2/Project_Lz438 If you have a manifest file that contains the paths to all of the data files in a dataset, you can use ycgaFastq as well: $ ycgaFastq manifest.txt ycgaFastq can be used in a variety of other ways to retrieve data. For more information, see the documentation or contact us. Tip Original sequence data are archived pursuant to the YCGA retention policy. For long-running projects we recommend you keep a personal backup of your sequence files. If you need to retrieve archived sequencing data, please see our below . Retrieve Data from the Archive Info The sequence archive /SAY/archive/YCGa-72009-YCGA-A2 is only mounted on the transfer node and transfer partition. You must ssh to transfer, or submit a job (batch or interactive) to the transfer partition, in order to access and download archived sequence data. In the sequencing data archive, a directory exists for each run, holding one or more tar files. There is a main tar file, plus a tar file for each project directory. Most users only need the project tar file corresponding to their data. Although the archive actually exists in cloud storage, you can treat it as a regular directory tree. Many operations such as ls , cd , etc. are very fast, since directory structures and file metadata are on a disk cache. However, when you actually read the contents of files the file is retrieved and read into a disk cache. This can take some time. Archived runs are stored in the following locations. Original location Archive location /panfs/ /SAY/archive/YCGA-729009-YCGA-A2/archive/panfs/ /ycga-ba/ /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-ba/ /gpfs/ycga/sequencers/illumina/ /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/ /gpfs/gibbs/pi/ycga/pacbio/ /SAY/archive/YCGA-729009-YCGA-A2/archive/pacbio/ You can directly copy or untar the project tarfile into a scratch directory. Info Very large tar files over 500GB, sometimes fail to download. If you run into problems, contact us at hpc@yale.edu and we can manually download it. cd ~/palmer_scratch/somedir tar \u2013xvf /SAY/archive/YCGA-729009-YCGA-A2/archive/path/to/file.tar Inside the project tar files are the fastq files, which have been compressed using quip . If your pipeline cannot read quip files directly, you will need to uncompress them before using them. module load Quip quip \u2013d M20_ACAGTG_L008_R1_009.fastq.qp If you have trouble locating your files, you can use the utility locateRun , using any substring of the original run name. locateRun is in the ycga-public module. module load ycga-public locateRun C9374AN Tip When retrieving data, run untar/unquip as a job on a compute node, not a login node and make sure to allocate sufficient resources to your job, e.g. \u2013c 20 --mem=100G . Tip The ycgaFastq tool can also be used to recover archived data. See above . Example Imagine that user rdb9 wants to restore data from run BHJWZZBCX3 step 1 Get session on transfer partition salloc -p transfer module load ycga-public step 2 Find the run location $ locateRun BHJWZZBCX3 /ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3.deleted /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3 Note that the original run location has been deleted, but the archive location is listed. step 3 List the contents of the archived run, and locate the desired project tarball: $ ls -1 /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3 210305_D00306_1337_BHJWZZBCX3_0.tar 210305_D00306_1337_BHJWZZBCX3_0_Unaligned_Project_Jdm222.tar 210305_D00306_1337_BHJWZZBCX3_1_Unaligned-1_Project_Rdb9.tar 210305_D00306_1337_BHJWZZBCX3_2021_05_09_04:00:36_archive.log We want 210305_D00306_1337_BHJWZZBCX3_1_Unaligned-1_Project_Rdb9.tar , matching our netid. step 4 First, copy the tarball to scratch. To do this you must be on the transfer partition or transfer node, since /SAY is only mounted there. cd ~/palmer_scratch rsync -L -v /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3/210305_D00306_1337_BHJWZZBCX3_1_Unaligned-1_Project_Rdb9.tar . step 5 Submit a batch job to use the restore utility to uncompress the fastq files from the tar file. In our example we'll use 32 cpus. This is not done using the transfer partition, but rather the day partition, since day will allow you more cpus. The restore will likely take several minutes. To see progress, you can use the -v flag. Put the following code in a batch script (e.g. myrestore.sh): #/bin/bash #SBATCH -c 32 #SBATCH -p day module load ycga-public restore -v -n $SLURM_CPUS_PER_TASK -t 210305_D00306_1337_BHJWKHBCX3_1_Unaligned-1_Project_Rdb9.tar Then submit the job using sbatch: sbatch myrestore.sh The restored fastq files will written to a directory like this: 210305_D00306_1337_BHJWZZBCX3/Data/Intensities/BaseCalls/Unaligned*/Project_*","title":"YCGA Data"},{"location":"data/ycga-data/#ycga-data","text":"Data associated with YCGA projects and sequencers are located on the YCGA storage system, accessible at /gpfs/ycga/sequencers on McCleary .","title":"YCGA Data"},{"location":"data/ycga-data/#ycga-access-retention-policy","text":"The McCleary high-performance computing system has specific resources that are dedicated to YCGA users. This includes a slurm partition (\u2018ycga\u2019) and a large parallel storage system (/gpfs/ycga). The following policy guidelines govern the use of these resources on McCleary for data storage and analysis.","title":"YCGA Access Retention Policy"},{"location":"data/ycga-data/#yale-university-faculty-user","text":"All Yale PIs using YCGA for library preparation and/or sequencing will have an additional 5 TB storage area called \u2018work\u2019 for data storage. This is in addition to the 5 TB storage area called \u2018project\u2019 that all McCleary groups receive. Currently, neither work or project storage is backed up. Users are responsible for protecting their own data. All Fastq files are available on the /gpfs/ycga storage system for one year. After that, the files are available in an archive that allows self-service retrieval, as described in the link above. Issues or questions about archived data can be addressed to ycga@yale.edu. Users processing sequence data on McCleary should be careful to submit their jobs to the \u2018ycga\u2019 partition. Jobs submitted to other partitions may incur additional charges. Members of Yale PI labs using YCGA for library preparation and/or sequencing may apply for accounts on McCleary with PI\u2019s approval. Each Yale PI lab will have a dedicated secure directory to store their data, and permission to lab members will be granted with the authorization of the respective PI. Furthermore, such approval will be terminated upon request from the PI or termination of Yale Net ID. Lab members moving to a new university will get access to HPC resources for an additional six months only upon permission from Yale PI. If Yale NetID is no longer accessible, former Yale members who were YCGA users should request a Sponsored Identity NetID from their business office. Sponsored Identity NetIDs will be valid for six months. Such users will also need to request VPN access. A PI moving to a new university to establish their lab will have access to their data for one year from the termination of their Yale position. During this time, the PI or one lab member from the new lab will be provided access to the HPC system. Request for Guest NetID should be made to their business office. Guest NetID will be valid for one year. Any new Yale faculty member will be given access to McCleary once they start using YCGA services. Users not utilizing the YCGA services will not be provided access to McCleary high- performance computing system.","title":"Yale University Faculty User"},{"location":"data/ycga-data/#external-collaborators","text":"Access to McCleary can be granted to collaborating labs, with the authorization of the respective Yale PI. A maximum of one account per collaborating lab will be granted. Furthermore, such approval will be terminated upon request from the PI. Request for a Sponsored Identity NetID should be made to the Yale PI\u2019s business office. Guest NetID will be valid for one year. The expectation is that the collaborator, with PI consent, will download data from the McCleary HPC system to their own internal system for data analysis.","title":"External Collaborators"},{"location":"data/ycga-data/#non-yale-users","text":"Users not affiliated with Yale University will not be provided access to McCleary high- performance computing system.","title":"Non-Yale Users"},{"location":"data/ycga-data/#ycga-data-retention-policy","text":"Illumina sequence data is initially written to YCGA's main storage system, which is located in the main HPC datacenter at Yale's West Campus. Data stored there is protected against loss by software RAID. Raw basecall data (bcl files) is immediately transformed into DNA sequences (fastq files). ~45 days after sequencing, the raw bcl files are deleted. ~60 days after sequencing, the fastq files are written to an archive. This archive exists in two geographically distinct copies for safety. ~365 days after sequencing, all data is deleted from main storage. Users continue to have access to the data via the archive. Data is retained on the archive indefinitely. See below for instructions for retrieving archived data. All compression of sequence data is lossless. Gzip is used for data stored on the main storage, and quip is used for data stored on the archive. Disaster recovery is provided by the archive copy. YCGA will send you an email informing you that your data is ready, and will include a url that looks like: http://fcb.ycga.yale.edu:3010/ randomstring /sample_dir_001 You can use that link to download your data in a browser, but if you plan to process the data on McCleary, it is better to make a soft link to the data, rather than copying it. To find the actual location of your data, do: $ readlink -f /ycga-gpfs/project/fas/lsprog/tools/external/data/randomstring/sample_dir_001","title":"YCGA Data Retention Policy"},{"location":"data/ycga-data/#illumina-sequencing-data","text":"For Illumina data (not singlecell or pacbio data), you can browse to the YCGA-provided URL and find a file ruddle_paths.txt that contains the true locations of the files. Alternatively, you can use the ycgaFastq tool to easily make soft links to the sequencing files: export PATH = $PATH :/gpfs/gibbs/pi/ycga/mane/ycga_bioinfo/bin_May2023 $ ycgaFastq fcb.ycga.yale.edu:3010/randomstring/sample_dir_001 ycgaFastq can also be used to retrieve data that has been archived. The simplest way to do that is to provide the sample submitter's netid and the flowcell (run) name: $ ycgaFastq rdb9 AHFH66DSXX If you have a path to the original location of the sequencing data, ycgaFastq can retrieve the data using that, even if the run has been archived and deleted: $ ycgaFastq /ycga-gpfs/sequencers/illumina/sequencerD/runs/190607_A00124_0104_AHLF3MMSXX/Data/Intensities/BaseCalls/Unaligned-2/Project_Lz438 If you have a manifest file that contains the paths to all of the data files in a dataset, you can use ycgaFastq as well: $ ycgaFastq manifest.txt ycgaFastq can be used in a variety of other ways to retrieve data. For more information, see the documentation or contact us. Tip Original sequence data are archived pursuant to the YCGA retention policy. For long-running projects we recommend you keep a personal backup of your sequence files. If you need to retrieve archived sequencing data, please see our below .","title":"Illumina sequencing data"},{"location":"data/ycga-data/#retrieve-data-from-the-archive","text":"Info The sequence archive /SAY/archive/YCGa-72009-YCGA-A2 is only mounted on the transfer node and transfer partition. You must ssh to transfer, or submit a job (batch or interactive) to the transfer partition, in order to access and download archived sequence data. In the sequencing data archive, a directory exists for each run, holding one or more tar files. There is a main tar file, plus a tar file for each project directory. Most users only need the project tar file corresponding to their data. Although the archive actually exists in cloud storage, you can treat it as a regular directory tree. Many operations such as ls , cd , etc. are very fast, since directory structures and file metadata are on a disk cache. However, when you actually read the contents of files the file is retrieved and read into a disk cache. This can take some time. Archived runs are stored in the following locations. Original location Archive location /panfs/ /SAY/archive/YCGA-729009-YCGA-A2/archive/panfs/ /ycga-ba/ /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-ba/ /gpfs/ycga/sequencers/illumina/ /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/ /gpfs/gibbs/pi/ycga/pacbio/ /SAY/archive/YCGA-729009-YCGA-A2/archive/pacbio/ You can directly copy or untar the project tarfile into a scratch directory. Info Very large tar files over 500GB, sometimes fail to download. If you run into problems, contact us at hpc@yale.edu and we can manually download it. cd ~/palmer_scratch/somedir tar \u2013xvf /SAY/archive/YCGA-729009-YCGA-A2/archive/path/to/file.tar Inside the project tar files are the fastq files, which have been compressed using quip . If your pipeline cannot read quip files directly, you will need to uncompress them before using them. module load Quip quip \u2013d M20_ACAGTG_L008_R1_009.fastq.qp If you have trouble locating your files, you can use the utility locateRun , using any substring of the original run name. locateRun is in the ycga-public module. module load ycga-public locateRun C9374AN Tip When retrieving data, run untar/unquip as a job on a compute node, not a login node and make sure to allocate sufficient resources to your job, e.g. \u2013c 20 --mem=100G . Tip The ycgaFastq tool can also be used to recover archived data. See above .","title":"Retrieve Data from the Archive"},{"location":"data/ycga-data/#example","text":"Imagine that user rdb9 wants to restore data from run BHJWZZBCX3 step 1 Get session on transfer partition salloc -p transfer module load ycga-public step 2 Find the run location $ locateRun BHJWZZBCX3 /ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3.deleted /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3 Note that the original run location has been deleted, but the archive location is listed. step 3 List the contents of the archived run, and locate the desired project tarball: $ ls -1 /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3 210305_D00306_1337_BHJWZZBCX3_0.tar 210305_D00306_1337_BHJWZZBCX3_0_Unaligned_Project_Jdm222.tar 210305_D00306_1337_BHJWZZBCX3_1_Unaligned-1_Project_Rdb9.tar 210305_D00306_1337_BHJWZZBCX3_2021_05_09_04:00:36_archive.log We want 210305_D00306_1337_BHJWZZBCX3_1_Unaligned-1_Project_Rdb9.tar , matching our netid. step 4 First, copy the tarball to scratch. To do this you must be on the transfer partition or transfer node, since /SAY is only mounted there. cd ~/palmer_scratch rsync -L -v /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3/210305_D00306_1337_BHJWZZBCX3_1_Unaligned-1_Project_Rdb9.tar . step 5 Submit a batch job to use the restore utility to uncompress the fastq files from the tar file. In our example we'll use 32 cpus. This is not done using the transfer partition, but rather the day partition, since day will allow you more cpus. The restore will likely take several minutes. To see progress, you can use the -v flag. Put the following code in a batch script (e.g. myrestore.sh): #/bin/bash #SBATCH -c 32 #SBATCH -p day module load ycga-public restore -v -n $SLURM_CPUS_PER_TASK -t 210305_D00306_1337_BHJWKHBCX3_1_Unaligned-1_Project_Rdb9.tar Then submit the job using sbatch: sbatch myrestore.sh The restored fastq files will written to a directory like this: 210305_D00306_1337_BHJWZZBCX3/Data/Intensities/BaseCalls/Unaligned*/Project_*","title":"Example"},{"location":"news/2022-02-grace/","text":"Grace Maintenance February 3-6, 2022 Software Updates Latest security patches applied Slurm updated to version 21.08.5 NVIDIA driver updated to version 510.39.01 (except for nodes with K80 GPUs which are stranded at 470.82.01) Singularity updated to version 3.8.5 Open OnDemand updated to version 2.0.20 Hardware Updates Changes have been made to networking to improve performance of certain older compute nodes Changes to Grace Home Directories During the maintenance, all home directories on Grace have been moved to our new all-flash storage filesystem, Palmer. The move is in anticipation of the decommissioning of Loomis at the end of the year and will provide a robust login experience by protecting home directory interactions from data intensive compute jobs. Due to this migration, your home directory path has changed from /gpfs/loomis/home.grace/ to /vast/palmer/home.grace/ . Your home directory can always be referenced in bash and submission scripts and from the command line with the $HOME variable. Please update any scripts and workflows accordingly. Interactive Jobs We have added an additional way to request an interactive job. The Slurm command salloc can be used to start an interactive job similar to srun --pty bash . In addition to being a simpler command (no --pty bash is needed), salloc jobs can be used to interactively test mpirun executables. Palmer scratch Palmer is out of beta! We have fixed the issue with Plink on Palmer, so now you can use Palmer scratch for any workloads. See https://docs.ycrc.yale.edu/data/hpc-storage#60-day-scratch for more information on Palmer scratch.","title":"2022 02 grace"},{"location":"news/2022-02-grace/#grace-maintenance","text":"February 3-6, 2022","title":"Grace Maintenance"},{"location":"news/2022-02-grace/#software-updates","text":"Latest security patches applied Slurm updated to version 21.08.5 NVIDIA driver updated to version 510.39.01 (except for nodes with K80 GPUs which are stranded at 470.82.01) Singularity updated to version 3.8.5 Open OnDemand updated to version 2.0.20","title":"Software Updates"},{"location":"news/2022-02-grace/#hardware-updates","text":"Changes have been made to networking to improve performance of certain older compute nodes","title":"Hardware Updates"},{"location":"news/2022-02-grace/#changes-to-grace-home-directories","text":"During the maintenance, all home directories on Grace have been moved to our new all-flash storage filesystem, Palmer. The move is in anticipation of the decommissioning of Loomis at the end of the year and will provide a robust login experience by protecting home directory interactions from data intensive compute jobs. Due to this migration, your home directory path has changed from /gpfs/loomis/home.grace/ to /vast/palmer/home.grace/ . Your home directory can always be referenced in bash and submission scripts and from the command line with the $HOME variable. Please update any scripts and workflows accordingly.","title":"Changes to Grace Home Directories"},{"location":"news/2022-02-grace/#interactive-jobs","text":"We have added an additional way to request an interactive job. The Slurm command salloc can be used to start an interactive job similar to srun --pty bash . In addition to being a simpler command (no --pty bash is needed), salloc jobs can be used to interactively test mpirun executables.","title":"Interactive Jobs"},{"location":"news/2022-02-grace/#palmer-scratch","text":"Palmer is out of beta! We have fixed the issue with Plink on Palmer, so now you can use Palmer scratch for any workloads. See https://docs.ycrc.yale.edu/data/hpc-storage#60-day-scratch for more information on Palmer scratch.","title":"Palmer scratch"},{"location":"news/2022-02/","text":"February 2022 Announcements Grace Maintenance The biannual scheduled maintenance for the Grace cluster will be occurring February 1-3. During this time, the cluster will be unavailable. See the Grace maintenance email announcement for more details. Data Transfers For non-Milgram users doing data transfers, transfers should not be performed on the login nodes. We have a few alternative ways to get better networking and reduce the impact on the clusters\u2019 login nodes: Dedicated transfer node . Each cluster has a dedicated transfer node, transfer-.hpc.yale.edu . You can ssh directly to this node and run commands. \u201ctransfer\u201d Slurm partition . This is a small partition managed by the scheduler for doing data transfer. You can submit jobs to it using srun/sbatch -p transfer \u2026 *For recurring or periodic data transfers (such as using cron), please use Slurm\u2019s scrontab to schedule jobs that run on the transfer partition instead. Globus . For robust transfers of larger amount of data, see our Globus documentation. More info about data transfers can be found in our Data Transfer documentation. Software Highlights Rclone is now installed on all nodes and loading the module is no longer necessary. MATLAB/2021b is now on all clusters. Julia/1.7.1-linux-x86_64 is now on all clusters. Mathematica/13.0.0 is now on Grace. QuantumESPRESSO/6.8-intel-2020b and QuantumESPRESSO/7.0-intel-2020b are now on Grace. Mathematica documentation has been updated with regards to configuring parallel jobs.","title":"2022 02"},{"location":"news/2022-02/#february-2022","text":"","title":"February 2022"},{"location":"news/2022-02/#announcements","text":"","title":"Announcements"},{"location":"news/2022-02/#grace-maintenance","text":"The biannual scheduled maintenance for the Grace cluster will be occurring February 1-3. During this time, the cluster will be unavailable. See the Grace maintenance email announcement for more details.","title":"Grace Maintenance"},{"location":"news/2022-02/#data-transfers","text":"For non-Milgram users doing data transfers, transfers should not be performed on the login nodes. We have a few alternative ways to get better networking and reduce the impact on the clusters\u2019 login nodes: Dedicated transfer node . Each cluster has a dedicated transfer node, transfer-.hpc.yale.edu . You can ssh directly to this node and run commands. \u201ctransfer\u201d Slurm partition . This is a small partition managed by the scheduler for doing data transfer. You can submit jobs to it using srun/sbatch -p transfer \u2026 *For recurring or periodic data transfers (such as using cron), please use Slurm\u2019s scrontab to schedule jobs that run on the transfer partition instead. Globus . For robust transfers of larger amount of data, see our Globus documentation. More info about data transfers can be found in our Data Transfer documentation.","title":"Data Transfers"},{"location":"news/2022-02/#software-highlights","text":"Rclone is now installed on all nodes and loading the module is no longer necessary. MATLAB/2021b is now on all clusters. Julia/1.7.1-linux-x86_64 is now on all clusters. Mathematica/13.0.0 is now on Grace. QuantumESPRESSO/6.8-intel-2020b and QuantumESPRESSO/7.0-intel-2020b are now on Grace. Mathematica documentation has been updated with regards to configuring parallel jobs.","title":"Software Highlights"},{"location":"news/2022-03/","text":"March 2022 Announcements Snapshots Snapshots are now available on all clusters for home and project spaces. Snapshots enable self-service restoration of modified or deleted files for at least 2 days in the past. See our User Documentation for more details on availability and instructions. OOD File Browser Tip: Shortcuts You can add shortcuts to your favorite paths in the OOD File Browser. See our OOD documentation for instructions on setting up shortcuts. Software Highlights R/4.1.0-foss-2020b is now on Grace. GCC/11.2.0 is now on Grace.","title":"2022 03"},{"location":"news/2022-03/#march-2022","text":"","title":"March 2022"},{"location":"news/2022-03/#announcements","text":"","title":"Announcements"},{"location":"news/2022-03/#snapshots","text":"Snapshots are now available on all clusters for home and project spaces. Snapshots enable self-service restoration of modified or deleted files for at least 2 days in the past. See our User Documentation for more details on availability and instructions.","title":"Snapshots"},{"location":"news/2022-03/#ood-file-browser-tip-shortcuts","text":"You can add shortcuts to your favorite paths in the OOD File Browser. See our OOD documentation for instructions on setting up shortcuts.","title":"OOD File Browser Tip: Shortcuts"},{"location":"news/2022-03/#software-highlights","text":"R/4.1.0-foss-2020b is now on Grace. GCC/11.2.0 is now on Grace.","title":"Software Highlights"},{"location":"news/2022-04-farnam/","text":"Farnam Maintenance April 4-7, 2022 Software Updates Security updates Slurm updated to 21.08.6 NVIDIA drivers updated to 510.47.03 (note: driver for NVIDIA K80 GPUs was upgraded to 470.103.01) Singularity replaced by Apptainer version 1.0.1 (note: the \"singularity\" command will still work as expected) Open OnDemand updated to 2.0.20 Hardware Updates Four new nodes with 4 NVIDIA GTX3090 GPUs each have been added Changes to the bigmem Partition Jobs requesting less than 120G of memory are no longer allowed in the \"bigmem\" partition. Please submit these jobs to the general or scavenge partitions instead. Changes to non-interactive sessions Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.","title":"2022 04 farnam"},{"location":"news/2022-04-farnam/#farnam-maintenance","text":"April 4-7, 2022","title":"Farnam Maintenance"},{"location":"news/2022-04-farnam/#software-updates","text":"Security updates Slurm updated to 21.08.6 NVIDIA drivers updated to 510.47.03 (note: driver for NVIDIA K80 GPUs was upgraded to 470.103.01) Singularity replaced by Apptainer version 1.0.1 (note: the \"singularity\" command will still work as expected) Open OnDemand updated to 2.0.20","title":"Software Updates"},{"location":"news/2022-04-farnam/#hardware-updates","text":"Four new nodes with 4 NVIDIA GTX3090 GPUs each have been added","title":"Hardware Updates"},{"location":"news/2022-04-farnam/#changes-to-the-bigmem-partition","text":"Jobs requesting less than 120G of memory are no longer allowed in the \"bigmem\" partition. Please submit these jobs to the general or scavenge partitions instead.","title":"Changes to the bigmem Partition"},{"location":"news/2022-04-farnam/#changes-to-non-interactive-sessions","text":"Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.","title":"Changes to non-interactive sessions"},{"location":"news/2022-04/","text":"April 2022 Announcements Updates to R on Open OnDemand RStudio Server is out of beta! With the deprecation of R 3.x (see below), we will be removing RStudio Desktop with module R from Open OnDemand on June 1st. Improvements to R install.packages Paths Starting with the R 4.1.0 software module, we now automatically set an environment variable ( R_LIBS_USER ) which directs these packages to be stored in your project space. This will helps ensure that packages are not limited by home-space quotas and that packages installed for different versions of R are properly separated from each other. Previously installed packages should still be available and there should be no disruption from the change. Instructions for Running a MySQL Server on the Clusters Occasionally it could be useful for a user to run their own MySQL database server on one of the clusters. Until now, that has not been possible, but recently we found a way using singularity. Instructions may be found in our new MySQL guide . Software Highlights R 3.x modules have been deprecated on all clusters and are no longer supported. If you need to continue to use an older version of R, look at our R conda documentation . R/4.1.0-foss-2020b is now available on all clusters. Seurat/4.1.0-foss-2020b-R-4.1.0 (for using the Seurat R package) is now available on all clusters.","title":"2022 04"},{"location":"news/2022-04/#april-2022","text":"","title":"April 2022"},{"location":"news/2022-04/#announcements","text":"","title":"Announcements"},{"location":"news/2022-04/#updates-to-r-on-open-ondemand","text":"RStudio Server is out of beta! With the deprecation of R 3.x (see below), we will be removing RStudio Desktop with module R from Open OnDemand on June 1st.","title":"Updates to R on Open OnDemand"},{"location":"news/2022-04/#improvements-to-r-installpackages-paths","text":"Starting with the R 4.1.0 software module, we now automatically set an environment variable ( R_LIBS_USER ) which directs these packages to be stored in your project space. This will helps ensure that packages are not limited by home-space quotas and that packages installed for different versions of R are properly separated from each other. Previously installed packages should still be available and there should be no disruption from the change.","title":"Improvements to R install.packages Paths"},{"location":"news/2022-04/#instructions-for-running-a-mysql-server-on-the-clusters","text":"Occasionally it could be useful for a user to run their own MySQL database server on one of the clusters. Until now, that has not been possible, but recently we found a way using singularity. Instructions may be found in our new MySQL guide .","title":"Instructions for Running a MySQL Server on the Clusters"},{"location":"news/2022-04/#software-highlights","text":"R 3.x modules have been deprecated on all clusters and are no longer supported. If you need to continue to use an older version of R, look at our R conda documentation . R/4.1.0-foss-2020b is now available on all clusters. Seurat/4.1.0-foss-2020b-R-4.1.0 (for using the Seurat R package) is now available on all clusters.","title":"Software Highlights"},{"location":"news/2022-05-ruddle/","text":"Ruddle Maintenance May 2, 2022 Software Updates Security updates Slurm updated to 21.08.7 Singularity replaced by Apptainer version 1.0.1 (note: the \"singularity\" command will still work as expected) Lmod updated to 8.7 Changes to non-interactive sessions Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.","title":"2022 05 ruddle"},{"location":"news/2022-05-ruddle/#ruddle-maintenance","text":"May 2, 2022","title":"Ruddle Maintenance"},{"location":"news/2022-05-ruddle/#software-updates","text":"Security updates Slurm updated to 21.08.7 Singularity replaced by Apptainer version 1.0.1 (note: the \"singularity\" command will still work as expected) Lmod updated to 8.7","title":"Software Updates"},{"location":"news/2022-05-ruddle/#changes-to-non-interactive-sessions","text":"Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.","title":"Changes to non-interactive sessions"},{"location":"news/2022-05/","text":"May 2022 Announcements Ruddle Maintenance The biannual scheduled maintenance for the Ruddle cluster will be occurring May 3-5. During this time, the cluster will be unavailable. See the Ruddle maintenance email announcements for more details. Remote Visualization with Hardware Acceleration VirtualGL is installed on all GPU nodes on Grace, Farnam, and Milgram to provide hardware accelerated 3D rendering. Instructions on how to use VirtualGL to accelerate your 3D applications can be found at https://docs.ycrc.yale.edu/clusters-at-yale/guides/virtualgl/ . Software Highlights Singularity is now called \"Apptainer\". Singularity is officially named \u201cApptainer\u201d as part of its move to the Linux Foundation. The new command apptainer works as drop-in replacement for singularity . However, the previous singularity command will also continue to work for the foreseeable future so no change is needed. The upgrade to Apptainer is on Grace, Farnam and Ruddle (as of the maintenance completion). Milgram will be upgraded to Apptainer during the June maintenance. Slurm has been upgraded to version 21.08.6 on Grace MATLAB/2022a is available on all clusters","title":"2022 05"},{"location":"news/2022-05/#may-2022","text":"","title":"May 2022"},{"location":"news/2022-05/#announcements","text":"","title":"Announcements"},{"location":"news/2022-05/#ruddle-maintenance","text":"The biannual scheduled maintenance for the Ruddle cluster will be occurring May 3-5. During this time, the cluster will be unavailable. See the Ruddle maintenance email announcements for more details.","title":"Ruddle Maintenance"},{"location":"news/2022-05/#remote-visualization-with-hardware-acceleration","text":"VirtualGL is installed on all GPU nodes on Grace, Farnam, and Milgram to provide hardware accelerated 3D rendering. Instructions on how to use VirtualGL to accelerate your 3D applications can be found at https://docs.ycrc.yale.edu/clusters-at-yale/guides/virtualgl/ .","title":"Remote Visualization with Hardware Acceleration"},{"location":"news/2022-05/#software-highlights","text":"Singularity is now called \"Apptainer\". Singularity is officially named \u201cApptainer\u201d as part of its move to the Linux Foundation. The new command apptainer works as drop-in replacement for singularity . However, the previous singularity command will also continue to work for the foreseeable future so no change is needed. The upgrade to Apptainer is on Grace, Farnam and Ruddle (as of the maintenance completion). Milgram will be upgraded to Apptainer during the June maintenance. Slurm has been upgraded to version 21.08.6 on Grace MATLAB/2022a is available on all clusters","title":"Software Highlights"},{"location":"news/2022-06-milgram/","text":"Milgram Maintenance June 7-8, 2022 Software Updates Security updates Slurm updated to 21.08.8-2 NVIDIA drivers updated to 515.43.04 Singularity replaced by Apptainer version 1.0.2 (note: the \"singularity\" command will still work as expected) Lmod updated to 8.7 Open OnDemand updated to 2.0.23 Hardware Updates The hostnames of the compute nodes on Milgram were changed to bring them in line with YCRC naming conventions. Changes to non-interactive sessions Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.","title":"2022 06 milgram"},{"location":"news/2022-06-milgram/#milgram-maintenance","text":"June 7-8, 2022","title":"Milgram Maintenance"},{"location":"news/2022-06-milgram/#software-updates","text":"Security updates Slurm updated to 21.08.8-2 NVIDIA drivers updated to 515.43.04 Singularity replaced by Apptainer version 1.0.2 (note: the \"singularity\" command will still work as expected) Lmod updated to 8.7 Open OnDemand updated to 2.0.23","title":"Software Updates"},{"location":"news/2022-06-milgram/#hardware-updates","text":"The hostnames of the compute nodes on Milgram were changed to bring them in line with YCRC naming conventions.","title":"Hardware Updates"},{"location":"news/2022-06-milgram/#changes-to-non-interactive-sessions","text":"Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.","title":"Changes to non-interactive sessions"},{"location":"news/2022-06/","text":"June 2022 Announcements Farnam Decommission & McCleary Announcement After more than six years in service, we will be retiring the Farnam HPC cluster later this year. Farnam will be replaced with a new HPC cluster, McCleary. The McCleary HPC cluster will be Yale's first direct-to-chip liquid cooled cluster, moving the YCRC and the Yale research computing community into a more environmentally friendly future. For more information about the decommission process and the launch of McCleary, see our website . RStudio (with module R) has been retired from Open OnDemand as of June 1st Please switch to RStudio Server which provides a better user experience. For users using a conda environment with RStudio, RStudio (with Conda R) will continue to be served on Open OnDemand. Milgram Maintenance The biannual scheduled maintenance for the Milgram cluster will be occurring June 7-9. During this time, the cluster will be unavailable. See the Milgram maintenance email announcements for more details. Software Highlights QTLtools/1.3.1-foss-2020b is now available on Farnam. R/4.2.0-foss-2020b is available on all clusters. Seurat for R/4.2.0 is now available on all clusters through the R-bundle-Bioconductor/3.15-foss-2020b-R-4.2.0 module along with many other packages. Please check to see if any packages you need are available in these modules before running install.packages .","title":"2022 06"},{"location":"news/2022-06/#june-2022","text":"","title":"June 2022"},{"location":"news/2022-06/#announcements","text":"","title":"Announcements"},{"location":"news/2022-06/#farnam-decommission-mccleary-announcement","text":"After more than six years in service, we will be retiring the Farnam HPC cluster later this year. Farnam will be replaced with a new HPC cluster, McCleary. The McCleary HPC cluster will be Yale's first direct-to-chip liquid cooled cluster, moving the YCRC and the Yale research computing community into a more environmentally friendly future. For more information about the decommission process and the launch of McCleary, see our website .","title":"Farnam Decommission & McCleary Announcement"},{"location":"news/2022-06/#rstudio-with-module-r-has-been-retired-from-open-ondemand-as-of-june-1st","text":"Please switch to RStudio Server which provides a better user experience. For users using a conda environment with RStudio, RStudio (with Conda R) will continue to be served on Open OnDemand.","title":"RStudio (with module R) has been retired from Open OnDemand as of June 1st"},{"location":"news/2022-06/#milgram-maintenance","text":"The biannual scheduled maintenance for the Milgram cluster will be occurring June 7-9. During this time, the cluster will be unavailable. See the Milgram maintenance email announcements for more details.","title":"Milgram Maintenance"},{"location":"news/2022-06/#software-highlights","text":"QTLtools/1.3.1-foss-2020b is now available on Farnam. R/4.2.0-foss-2020b is available on all clusters. Seurat for R/4.2.0 is now available on all clusters through the R-bundle-Bioconductor/3.15-foss-2020b-R-4.2.0 module along with many other packages. Please check to see if any packages you need are available in these modules before running install.packages .","title":"Software Highlights"},{"location":"news/2022-07/","text":"July 2022 Announcements Loomis Decommission After almost a decade in service, the primary storage system on Grace, Loomis ( /gpfs/loomis ), will be retired later this year. The usage and capacity on Loomis will be replaced by two existing YCRC storage systems, Palmer and Gibbs, which are already available on Grace. Data in Loomis project storage will be migrated to /gpfs/gibbs/project during the upcoming August Grace maintenance. See the Loomis Decommission documenation for more information and updates. Updates to OOD Jupyter App OOD Jupyter App has been updated to handle conda environments more intelligently. Instead of listing all the conda envs in your account, the app now lists only the conda environments with Jupyter installed. If you do not see your desired environment listed in the dropdown, check that you have installed Jupyter in that environment. In addition, the \u201cjupyterlab\u201d checkbox in the app will only be visible if the environment selected has jupyterlab installed. YCRC conda environment ycrc_conda_env.list has been replaced by ycrc_conda_env.sh . To update your conda enviroments in OOD for the Jupyter App and RStudio Desktop (with Conda R), please run ycrc_conda_env.sh update . Software Highlights miniconda/4.12.0 is now available on all clusters RStudio/2022.02.3-492 is now available on all clusters. This is currently the only version that is compatible with the graphic engine used by R/4.2.0-foss-2020b. fmriprep/21.0.2 is now available on Milgram. cellranger/7.0.0 is now available on Farnam.","title":"2022 07"},{"location":"news/2022-07/#july-2022","text":"","title":"July 2022"},{"location":"news/2022-07/#announcements","text":"","title":"Announcements"},{"location":"news/2022-07/#loomis-decommission","text":"After almost a decade in service, the primary storage system on Grace, Loomis ( /gpfs/loomis ), will be retired later this year. The usage and capacity on Loomis will be replaced by two existing YCRC storage systems, Palmer and Gibbs, which are already available on Grace. Data in Loomis project storage will be migrated to /gpfs/gibbs/project during the upcoming August Grace maintenance. See the Loomis Decommission documenation for more information and updates.","title":"Loomis Decommission"},{"location":"news/2022-07/#updates-to-ood-jupyter-app","text":"OOD Jupyter App has been updated to handle conda environments more intelligently. Instead of listing all the conda envs in your account, the app now lists only the conda environments with Jupyter installed. If you do not see your desired environment listed in the dropdown, check that you have installed Jupyter in that environment. In addition, the \u201cjupyterlab\u201d checkbox in the app will only be visible if the environment selected has jupyterlab installed.","title":"Updates to OOD Jupyter App"},{"location":"news/2022-07/#ycrc-conda-environment","text":"ycrc_conda_env.list has been replaced by ycrc_conda_env.sh . To update your conda enviroments in OOD for the Jupyter App and RStudio Desktop (with Conda R), please run ycrc_conda_env.sh update .","title":"YCRC conda environment"},{"location":"news/2022-07/#software-highlights","text":"miniconda/4.12.0 is now available on all clusters RStudio/2022.02.3-492 is now available on all clusters. This is currently the only version that is compatible with the graphic engine used by R/4.2.0-foss-2020b. fmriprep/21.0.2 is now available on Milgram. cellranger/7.0.0 is now available on Farnam.","title":"Software Highlights"},{"location":"news/2022-08-grace/","text":"Grace Maintenance August 2-4, 2022 Software Updates Security updates Slurm updated to 22.05.2 NVIDIA drivers updated to 515.48.07 (except for nodes with K80 GPUs, which are stranded at 470.129.06) Singularity replaced by Apptainer version 1.0.3 (note: the \"singularity\" command will still work as expected) Lmod updated to 8.7 Open OnDemand updated to 2.0.26 Hardware Updates Core components of the ethernet network were upgraded to improve performance and increase overall capacity. Loomis Decommission and Project Data Migration After over eight years in service, the primary storage system on Grace, Loomis ( /gpfs/loomis ), will be retired later this year. Project. We have migrated all of the Loomis project space ( /gpfs/loomis/project ) to the Gibbs storage system at /gpfs/gibbs/project during the maintenance. You will need to update your scripts and workflows to point to the new location ( /gpfs/gibbs/project// ). The \"project\" symlink in your home directory has been updated to point to your new space (with a few exceptions described below), so scripts using the symlinked path will not need to be updated. If you have jobs in a pending state going into the maintenance that used the absolute Loomis path, we recommend canceling, updating and then re-submitting those jobs so they do not fail. If you had a project space that exceeds the no-cost allocation (4 TiB), you have received a separate communication from us with details about your data migration. In these instances, your group has been granted a new, empty \"project\" space with the default no-cost quota. Any scripts will need to be updated accordingly. Conda. By default, all conda environments are installed into your project directory. However, most conda environments do not survive being moved from one location to another, so you may need to regenerate your conda environment(s). To assist with this, we provide conda-export documentation . R. Similarly, in 2022 we started redirecting user R packages to your project space to conserve home directory usage. If your R environment is not working as expected, we recommend deleting the old installation (found in ~/project/R/ ) and rerunning install.packages. Custom Software Installation. If you or your group had any self-installed software in the project directory, it is possible that the migration will have broken the software and it will need to be recompiled. Contact us if you need assistance recompiling. Scratch60. The Loomis scratch space ( /gpfs/loomis/scratch60 ) is now read-only. All data in that directory will be purged in 60 days on October 3, 2022 . Any data in /gpfs/loomis/scratch60 you wish to retain needs to be copied into another location by that date (such as your Gibbs project or Palmer scratch). Changes to Non-Interactive Sessions Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.","title":"2022 08 grace"},{"location":"news/2022-08-grace/#grace-maintenance","text":"August 2-4, 2022","title":"Grace Maintenance"},{"location":"news/2022-08-grace/#software-updates","text":"Security updates Slurm updated to 22.05.2 NVIDIA drivers updated to 515.48.07 (except for nodes with K80 GPUs, which are stranded at 470.129.06) Singularity replaced by Apptainer version 1.0.3 (note: the \"singularity\" command will still work as expected) Lmod updated to 8.7 Open OnDemand updated to 2.0.26","title":"Software Updates"},{"location":"news/2022-08-grace/#hardware-updates","text":"Core components of the ethernet network were upgraded to improve performance and increase overall capacity.","title":"Hardware Updates"},{"location":"news/2022-08-grace/#loomis-decommission-and-project-data-migration","text":"After over eight years in service, the primary storage system on Grace, Loomis ( /gpfs/loomis ), will be retired later this year. Project. We have migrated all of the Loomis project space ( /gpfs/loomis/project ) to the Gibbs storage system at /gpfs/gibbs/project during the maintenance. You will need to update your scripts and workflows to point to the new location ( /gpfs/gibbs/project// ). The \"project\" symlink in your home directory has been updated to point to your new space (with a few exceptions described below), so scripts using the symlinked path will not need to be updated. If you have jobs in a pending state going into the maintenance that used the absolute Loomis path, we recommend canceling, updating and then re-submitting those jobs so they do not fail. If you had a project space that exceeds the no-cost allocation (4 TiB), you have received a separate communication from us with details about your data migration. In these instances, your group has been granted a new, empty \"project\" space with the default no-cost quota. Any scripts will need to be updated accordingly. Conda. By default, all conda environments are installed into your project directory. However, most conda environments do not survive being moved from one location to another, so you may need to regenerate your conda environment(s). To assist with this, we provide conda-export documentation . R. Similarly, in 2022 we started redirecting user R packages to your project space to conserve home directory usage. If your R environment is not working as expected, we recommend deleting the old installation (found in ~/project/R/ ) and rerunning install.packages. Custom Software Installation. If you or your group had any self-installed software in the project directory, it is possible that the migration will have broken the software and it will need to be recompiled. Contact us if you need assistance recompiling. Scratch60. The Loomis scratch space ( /gpfs/loomis/scratch60 ) is now read-only. All data in that directory will be purged in 60 days on October 3, 2022 . Any data in /gpfs/loomis/scratch60 you wish to retain needs to be copied into another location by that date (such as your Gibbs project or Palmer scratch).","title":"Loomis Decommission and Project Data Migration"},{"location":"news/2022-08-grace/#changes-to-non-interactive-sessions","text":"Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.","title":"Changes to Non-Interactive Sessions"},{"location":"news/2022-08/","text":"August 2022 Announcements Grace Maintenance & Storage Changes The biannual scheduled maintenance for the Grace cluster will be occurring August 2-4. During this time, the cluster will be unavailable. See the Grace maintenance email announcement for more details. During the maintenance, significant changes will be made to the project and scratch60 directories on Grace. See our website for more information and updates . SpinUp Researcher Image & Containers Yale offers a simple portal for creating cloud-based compute resources called SpinUp . These cloud instances are hosted on Amazon Web Services, but have access to Yale services like Active Directory, DNS, and Storage at Yale. SpinUp offers a range of services including virtual machines, web servers, remote storage, and databases. Part of this service is a Researcher Image, an Ubuntu-based system which contains a suite of pre-installed commonly utilized software utilities, including: - PyTorch, TensorFlow, Keras, and other GPU-accelerated deep learning frameworks - GCC, Cmake, Go, and other development tools - Singularity/Apptainer and Docker for container development We recommend researchers looking to develop containers for use on YCRC HPC resources to utilize SpinUp to build containers which can then be copied to the clusters. If there are software utilities or commonly used tools that you would like added to the Researcher Image, let us know and we can work with the Cloud Team to get them integrated. Software Highlights AFNI/2022.1.14 is now available on Farnam and Milgram. cellranger/7.0.0 is now available on Grace.","title":"2022 08"},{"location":"news/2022-08/#august-2022","text":"","title":"August 2022"},{"location":"news/2022-08/#announcements","text":"","title":"Announcements"},{"location":"news/2022-08/#grace-maintenance-storage-changes","text":"The biannual scheduled maintenance for the Grace cluster will be occurring August 2-4. During this time, the cluster will be unavailable. See the Grace maintenance email announcement for more details. During the maintenance, significant changes will be made to the project and scratch60 directories on Grace. See our website for more information and updates .","title":"Grace Maintenance & Storage Changes"},{"location":"news/2022-08/#spinup-researcher-image-containers","text":"Yale offers a simple portal for creating cloud-based compute resources called SpinUp . These cloud instances are hosted on Amazon Web Services, but have access to Yale services like Active Directory, DNS, and Storage at Yale. SpinUp offers a range of services including virtual machines, web servers, remote storage, and databases. Part of this service is a Researcher Image, an Ubuntu-based system which contains a suite of pre-installed commonly utilized software utilities, including: - PyTorch, TensorFlow, Keras, and other GPU-accelerated deep learning frameworks - GCC, Cmake, Go, and other development tools - Singularity/Apptainer and Docker for container development We recommend researchers looking to develop containers for use on YCRC HPC resources to utilize SpinUp to build containers which can then be copied to the clusters. If there are software utilities or commonly used tools that you would like added to the Researcher Image, let us know and we can work with the Cloud Team to get them integrated.","title":"SpinUp Researcher Image & Containers"},{"location":"news/2022-08/#software-highlights","text":"AFNI/2022.1.14 is now available on Farnam and Milgram. cellranger/7.0.0 is now available on Grace.","title":"Software Highlights"},{"location":"news/2022-09/","text":"September 2022 Announcements Software Module Extensions Our software module utility ( Lmod ) has been enhanced to enable searching for Python and R (among other software) extensions. This is a very helpful way to know which software modules contain a specific library or package. For example, to see what versions of ggplot2 are available, use the module spider command. $ module spider ggplot2 -------------------------------------------------------- ggplot2: -------------------------------------------------------- Versions: ggplot2/3.3.2 (E) ggplot2/3.3.3 (E) ggplot2/3.3.5 (E) $ module spider ggplot2/3.3.5 ----------------------------------------------------------- ggplot2: ggplot2/3.3.5 (E) ----------------------------------------------------------- This extension is provided by the following modules. To access the extension you must load one of the following modules. Note that any module names in parentheses show the module location in the software hierarchy. R/4.2.0-foss-2020b This indicates that by loading the R/4.2.0-foss-2020b module you will gain access to ggplot2/3.3.5 . Software Highlights topaz/0.2.5-fosscuda-2020b for use with RELION (fosscuda-2020b toolchain) is now available as a module on Farnam.","title":"2022 09"},{"location":"news/2022-09/#september-2022","text":"","title":"September 2022"},{"location":"news/2022-09/#announcements","text":"","title":"Announcements"},{"location":"news/2022-09/#software-module-extensions","text":"Our software module utility ( Lmod ) has been enhanced to enable searching for Python and R (among other software) extensions. This is a very helpful way to know which software modules contain a specific library or package. For example, to see what versions of ggplot2 are available, use the module spider command. $ module spider ggplot2 -------------------------------------------------------- ggplot2: -------------------------------------------------------- Versions: ggplot2/3.3.2 (E) ggplot2/3.3.3 (E) ggplot2/3.3.5 (E) $ module spider ggplot2/3.3.5 ----------------------------------------------------------- ggplot2: ggplot2/3.3.5 (E) ----------------------------------------------------------- This extension is provided by the following modules. To access the extension you must load one of the following modules. Note that any module names in parentheses show the module location in the software hierarchy. R/4.2.0-foss-2020b This indicates that by loading the R/4.2.0-foss-2020b module you will gain access to ggplot2/3.3.5 .","title":"Software Module Extensions"},{"location":"news/2022-09/#software-highlights","text":"topaz/0.2.5-fosscuda-2020b for use with RELION (fosscuda-2020b toolchain) is now available as a module on Farnam.","title":"Software Highlights"},{"location":"news/2022-10-farnam/","text":"Farnam Maintenance October 4-5, 2022 Software Updates Security updates Slurm updated to 22.05.3 NVIDIA drivers updated to 515.65.01 Lmod updated to 8.7 Apptainer updated to 1.0.3 Open OnDemand updated to 2.0.28 Hardware Updates No hardware changes during this maintenance.","title":"2022 10 farnam"},{"location":"news/2022-10-farnam/#farnam-maintenance","text":"October 4-5, 2022","title":"Farnam Maintenance"},{"location":"news/2022-10-farnam/#software-updates","text":"Security updates Slurm updated to 22.05.3 NVIDIA drivers updated to 515.65.01 Lmod updated to 8.7 Apptainer updated to 1.0.3 Open OnDemand updated to 2.0.28","title":"Software Updates"},{"location":"news/2022-10-farnam/#hardware-updates","text":"No hardware changes during this maintenance.","title":"Hardware Updates"},{"location":"news/2022-10/","text":"October 2022 Announcements Farnam Maintenance The biannual scheduled maintenance for the Farnam cluster will be occurring Oct 4-6. During this time, the cluster will be unavailable. See the Farnam maintenance email announcements for more details. Gibbs Maintenance Additionally, the Gibbs storage system will be unavailable on Grace and Ruddle on Oct 4 to deploy an urgent firmware fix. All jobs on those clusters will be held, and no new jobs will be able to start during the upgrade to avoid job failures. New Command for Interactive Jobs The new version of Slurm (the scheduler) has improved the process of launching an interactive compute job. Instead of the clunky srun --pty bash syntax from previous versions, this is now replaced with salloc . In addition, the interactive partition is now the default partition for jobs launched using salloc . Thus a simple (1 core, 1 hour) interactive job can be requested like this: salloc which will submit the job and then move your shell to the allocated compute node. For MPI users, this allows multi-node parallel jobs to be properly launched inside an interactive compute job, which did not work as expected previously. For example, here is a two-node job, launched with salloc and then a parallel job-step launched with srun : [user@grace1 ~]$ salloc --nodes 2 --ntasks 2 --cpus-per-task 1 salloc: Nodes p09r07n[24,28] are ready for job [user@p09r07n24 ~]$ srun hostname p09r07n24.grace.hpc.yale.internal P09r07n28.grace.hpc.yale.internal For more information on salloc , please refer to Slurm\u2019s documentation . Software Highlights cellranger/7.0.1 is now available on Farnam. LAMMPS/23Jun2022-foss-2020b-kokkos is now available on Grace.","title":"2022 10"},{"location":"news/2022-10/#october-2022","text":"","title":"October 2022"},{"location":"news/2022-10/#announcements","text":"","title":"Announcements"},{"location":"news/2022-10/#farnam-maintenance","text":"The biannual scheduled maintenance for the Farnam cluster will be occurring Oct 4-6. During this time, the cluster will be unavailable. See the Farnam maintenance email announcements for more details.","title":"Farnam Maintenance"},{"location":"news/2022-10/#gibbs-maintenance","text":"Additionally, the Gibbs storage system will be unavailable on Grace and Ruddle on Oct 4 to deploy an urgent firmware fix. All jobs on those clusters will be held, and no new jobs will be able to start during the upgrade to avoid job failures.","title":"Gibbs Maintenance"},{"location":"news/2022-10/#new-command-for-interactive-jobs","text":"The new version of Slurm (the scheduler) has improved the process of launching an interactive compute job. Instead of the clunky srun --pty bash syntax from previous versions, this is now replaced with salloc . In addition, the interactive partition is now the default partition for jobs launched using salloc . Thus a simple (1 core, 1 hour) interactive job can be requested like this: salloc which will submit the job and then move your shell to the allocated compute node. For MPI users, this allows multi-node parallel jobs to be properly launched inside an interactive compute job, which did not work as expected previously. For example, here is a two-node job, launched with salloc and then a parallel job-step launched with srun : [user@grace1 ~]$ salloc --nodes 2 --ntasks 2 --cpus-per-task 1 salloc: Nodes p09r07n[24,28] are ready for job [user@p09r07n24 ~]$ srun hostname p09r07n24.grace.hpc.yale.internal P09r07n28.grace.hpc.yale.internal For more information on salloc , please refer to Slurm\u2019s documentation .","title":"New Command for Interactive Jobs"},{"location":"news/2022-10/#software-highlights","text":"cellranger/7.0.1 is now available on Farnam. LAMMPS/23Jun2022-foss-2020b-kokkos is now available on Grace.","title":"Software Highlights"},{"location":"news/2022-11-ruddle/","text":"Ruddle Maintenance November 1, 2022 Software Updates Security updates Slurm updated to 22.05.5 Apptainer updated to 1.1.2 Open OnDemand updated to 2.0.28 Hardware Updates No hardware changes during this maintenance.","title":"2022 11 ruddle"},{"location":"news/2022-11-ruddle/#ruddle-maintenance","text":"November 1, 2022","title":"Ruddle Maintenance"},{"location":"news/2022-11-ruddle/#software-updates","text":"Security updates Slurm updated to 22.05.5 Apptainer updated to 1.1.2 Open OnDemand updated to 2.0.28","title":"Software Updates"},{"location":"news/2022-11-ruddle/#hardware-updates","text":"No hardware changes during this maintenance.","title":"Hardware Updates"},{"location":"news/2022-11/","text":"November 2022 Announcements Ruddle Maintenance The biannual scheduled maintenance for the Ruddle cluster will be occurring Nov 1-3. During this time, the cluster will be unavailable. See the Ruddle maintenance email announcements for more details. Grace and Milgram Maintenance Schedule Change We will be adjusting the timing of Grace and Milgram's scheduled maintenance periods. Starting this December, Grace's maintenance periods will occur in December and June, with the next scheduled for December 6-8, 2022. Milgram's next maintenance will instead be performed in February and August, with the next scheduled for February 7-9, 2023. Please refer to previously sent communications for more information and see the full maintenance schedule for next year on our status page. Requeue after Timeout The YCRC clusters all have maximum time-limits that sometimes are shorter than a job needs to finish. This can be a frustration for researchers trying to get a simulation or a project finished. However, a number of workflows have the ability to periodically save the status of a process to a file and restart from where it left off. This is often referred to as \"checkpointing\" and is built into many standard software tools, like Gaussian and Gromacs. Slurm is able to send a signal to your job just before it runs out of time. Upon receiving this signal, you can have your job save its current status and automatically submit a new version of the job which picks up where it left off. Here is an example of a simple script that resubmits a job after receiving the TIMEOUT signal: #!/bin/bash #SBATCH -p day #SBATCH -t 24:00:00 #SBATCH -c 1 #SBATCH --signal=B:10@30 # send the signal `10` at 30s before job finishes #SBATCH --requeue # mark this job eligible for requeueing # define a `trap` that catches the signal and requeues the job trap \"echo -n 'TIMEOUT @ '; date; echo 'Resubmitting...'; scontrol requeue ${SLURM_JOBID} \" 10 # run the main code, with the `&` to \u201cbackground\u201d the task ./my_code.exe & # wait for either the main code to finish to receive the signal wait This tells Slurm to send SIGNAL10 at ~30s before the job finishes. Then we define an action (or trap ) based on this signal which requeues the job. Don\u2019t forget to add the & to the end of the main executable and the wait command so that the trap is able to catch the signal. Software Highlights MATLAB/2022b is now available on all clusters.","title":"2022 11"},{"location":"news/2022-11/#november-2022","text":"","title":"November 2022"},{"location":"news/2022-11/#announcements","text":"","title":"Announcements"},{"location":"news/2022-11/#ruddle-maintenance","text":"The biannual scheduled maintenance for the Ruddle cluster will be occurring Nov 1-3. During this time, the cluster will be unavailable. See the Ruddle maintenance email announcements for more details.","title":"Ruddle Maintenance"},{"location":"news/2022-11/#grace-and-milgram-maintenance-schedule-change","text":"We will be adjusting the timing of Grace and Milgram's scheduled maintenance periods. Starting this December, Grace's maintenance periods will occur in December and June, with the next scheduled for December 6-8, 2022. Milgram's next maintenance will instead be performed in February and August, with the next scheduled for February 7-9, 2023. Please refer to previously sent communications for more information and see the full maintenance schedule for next year on our status page.","title":"Grace and Milgram Maintenance Schedule Change"},{"location":"news/2022-11/#requeue-after-timeout","text":"The YCRC clusters all have maximum time-limits that sometimes are shorter than a job needs to finish. This can be a frustration for researchers trying to get a simulation or a project finished. However, a number of workflows have the ability to periodically save the status of a process to a file and restart from where it left off. This is often referred to as \"checkpointing\" and is built into many standard software tools, like Gaussian and Gromacs. Slurm is able to send a signal to your job just before it runs out of time. Upon receiving this signal, you can have your job save its current status and automatically submit a new version of the job which picks up where it left off. Here is an example of a simple script that resubmits a job after receiving the TIMEOUT signal: #!/bin/bash #SBATCH -p day #SBATCH -t 24:00:00 #SBATCH -c 1 #SBATCH --signal=B:10@30 # send the signal `10` at 30s before job finishes #SBATCH --requeue # mark this job eligible for requeueing # define a `trap` that catches the signal and requeues the job trap \"echo -n 'TIMEOUT @ '; date; echo 'Resubmitting...'; scontrol requeue ${SLURM_JOBID} \" 10 # run the main code, with the `&` to \u201cbackground\u201d the task ./my_code.exe & # wait for either the main code to finish to receive the signal wait This tells Slurm to send SIGNAL10 at ~30s before the job finishes. Then we define an action (or trap ) based on this signal which requeues the job. Don\u2019t forget to add the & to the end of the main executable and the wait command so that the trap is able to catch the signal.","title":"Requeue after Timeout"},{"location":"news/2022-11/#software-highlights","text":"MATLAB/2022b is now available on all clusters.","title":"Software Highlights"},{"location":"news/2022-12-grace/","text":"Grace Maintenance December 6-8, 2022 Software Updates Slurm updated to 22.05.6 NVIDIA drivers updated to 520.61.05 Apptainer updated to 1.1.3 Open OnDemand updated to 2.0.28 Hardware Updates Roughly 2 racks worth of equipment were moved to upgrade the effective InfiniBand connection speeds of several compute nodes (from 56 to 100 Gbps) The InfiniBand network was modified to increase capacity and allow for additional growth Some parts of the regular network were improved to shorten network paths and increase shared-uplink bandwidth Loomis Decommission The Loomis GPFS filesystem has been retired and unmounted from Grace, Farnam, and Ruddle. For additional information please see the Loomis Decommission page .","title":"2022 12 grace"},{"location":"news/2022-12-grace/#grace-maintenance","text":"December 6-8, 2022","title":"Grace Maintenance"},{"location":"news/2022-12-grace/#software-updates","text":"Slurm updated to 22.05.6 NVIDIA drivers updated to 520.61.05 Apptainer updated to 1.1.3 Open OnDemand updated to 2.0.28","title":"Software Updates"},{"location":"news/2022-12-grace/#hardware-updates","text":"Roughly 2 racks worth of equipment were moved to upgrade the effective InfiniBand connection speeds of several compute nodes (from 56 to 100 Gbps) The InfiniBand network was modified to increase capacity and allow for additional growth Some parts of the regular network were improved to shorten network paths and increase shared-uplink bandwidth","title":"Hardware Updates"},{"location":"news/2022-12-grace/#loomis-decommission","text":"The Loomis GPFS filesystem has been retired and unmounted from Grace, Farnam, and Ruddle. For additional information please see the Loomis Decommission page .","title":"Loomis Decommission"},{"location":"news/2022-12/","text":"December 2022 Announcements Grace & Gibbs Maintenance The biannual scheduled maintenance for the Grace cluster will be occurring December 6-8. During this time, the cluster will be unavailable. Additionally, the Gibbs filesystem will be unavailable on Farnam and Ruddle on Tuesday, December 6th to deploy a critical firmware upgrade. See the maintenance email announcements for more details. Loomis Decommission The Loomis GPFS filesystem will be retired and unmounted from Grace and Farnam during the Grace December maintenance starting on December 6th. All data except for a few remaining private filesets have already been transferred to other systems (e.g., current software, home, scratch to Palmer and project to Gibbs). The remaining private filesets are being transferred to Gibbs in advance of the maintenance and owners should have received communications accordingly. The only potential user impact of the retirement is on anyone using the older, deprecated software trees. Otherwise, the Loomis retirement should have no user impact but please reach out if you have any concerns or believe you are still using data located on Loomis. See the Loomis Decommission documentation for more information. Apptainer Upgrade on Grace and Ruddle The newest version of Apptainer (v1.1, available now on Ruddle and, after December maintenance, on Grace) comes the ability to create containers without needing elevated privileges (i.e. sudo access). This greatly simplifies the container workflow as you no longer need a separate system to build a container from a definition file. You can simply create a definition file and run the build command. For example, to create a simple toy container from this def file ( lolcow.def ): BootStrap: docker From: ubuntu:20.04 %post apt-get -y update apt-get -y install cowsay lolcat %environment export LC_ALL=C export PATH=/usr/games:$PATH %runscript date | cowsay | lolcat You can run: salloc -p interactive -c 4 apptainer build lolcow.sif lolcow.def This upgrade is live on Ruddle and will be applied on Grace during the December maintenance. For more information, please see the Apptainer documentation site and our docs page on containers . Software Highlights RELION/4.0.0-fosscuda-2020b for cryo-EM/cryo-tomography data processing is now available on Farnam. RELION/3.1 will no longer be updated by the RELION developer. Note that data processed with RELION 4 are not backwards compatible with RELION 3.","title":"2022 12"},{"location":"news/2022-12/#december-2022","text":"","title":"December 2022"},{"location":"news/2022-12/#announcements","text":"","title":"Announcements"},{"location":"news/2022-12/#grace-gibbs-maintenance","text":"The biannual scheduled maintenance for the Grace cluster will be occurring December 6-8. During this time, the cluster will be unavailable. Additionally, the Gibbs filesystem will be unavailable on Farnam and Ruddle on Tuesday, December 6th to deploy a critical firmware upgrade. See the maintenance email announcements for more details.","title":"Grace & Gibbs Maintenance"},{"location":"news/2022-12/#loomis-decommission","text":"The Loomis GPFS filesystem will be retired and unmounted from Grace and Farnam during the Grace December maintenance starting on December 6th. All data except for a few remaining private filesets have already been transferred to other systems (e.g., current software, home, scratch to Palmer and project to Gibbs). The remaining private filesets are being transferred to Gibbs in advance of the maintenance and owners should have received communications accordingly. The only potential user impact of the retirement is on anyone using the older, deprecated software trees. Otherwise, the Loomis retirement should have no user impact but please reach out if you have any concerns or believe you are still using data located on Loomis. See the Loomis Decommission documentation for more information.","title":"Loomis Decommission"},{"location":"news/2022-12/#apptainer-upgrade-on-grace-and-ruddle","text":"The newest version of Apptainer (v1.1, available now on Ruddle and, after December maintenance, on Grace) comes the ability to create containers without needing elevated privileges (i.e. sudo access). This greatly simplifies the container workflow as you no longer need a separate system to build a container from a definition file. You can simply create a definition file and run the build command. For example, to create a simple toy container from this def file ( lolcow.def ): BootStrap: docker From: ubuntu:20.04 %post apt-get -y update apt-get -y install cowsay lolcat %environment export LC_ALL=C export PATH=/usr/games:$PATH %runscript date | cowsay | lolcat You can run: salloc -p interactive -c 4 apptainer build lolcow.sif lolcow.def This upgrade is live on Ruddle and will be applied on Grace during the December maintenance. For more information, please see the Apptainer documentation site and our docs page on containers .","title":"Apptainer Upgrade on Grace and Ruddle"},{"location":"news/2022-12/#software-highlights","text":"RELION/4.0.0-fosscuda-2020b for cryo-EM/cryo-tomography data processing is now available on Farnam. RELION/3.1 will no longer be updated by the RELION developer. Note that data processed with RELION 4 are not backwards compatible with RELION 3.","title":"Software Highlights"},{"location":"news/2023-01/","text":"January 2023 Announcements Open OnDemand VSCode A new OOD app code-server is now available on all clusters, except Milgram (coming in Feb). Code-server allows you to run VSCode in a browser on a compute node. All users who have been running VSCode on a login node via the ssh extension should switch to code-server immediately. The app allows you to use GPUs, to allocate large memories, and to specify a private partition (if you have the access), things you won\u2019t be able to do if you run VSCode on a login node. The app is still in beta version and your feedback is much appreciated. Milgram Transfer Node Milgram now has a node dedicated to data transfers to and from the cluster. To access the node from within Milgram, run ssh transfer from the login node. To upload or download data from Milgram via the transfer node, use the hostname transfer-milgram.hpc.yale.edu (must be on VPN). More information can be found in our Transfer Data documentation . With the addition of the new transfer node, we ask that the login nodes are no longer used for data transfers to limit impact on regular login activities.","title":"2023 01"},{"location":"news/2023-01/#january-2023","text":"","title":"January 2023"},{"location":"news/2023-01/#announcements","text":"","title":"Announcements"},{"location":"news/2023-01/#open-ondemand-vscode","text":"A new OOD app code-server is now available on all clusters, except Milgram (coming in Feb). Code-server allows you to run VSCode in a browser on a compute node. All users who have been running VSCode on a login node via the ssh extension should switch to code-server immediately. The app allows you to use GPUs, to allocate large memories, and to specify a private partition (if you have the access), things you won\u2019t be able to do if you run VSCode on a login node. The app is still in beta version and your feedback is much appreciated.","title":"Open OnDemand VSCode"},{"location":"news/2023-01/#milgram-transfer-node","text":"Milgram now has a node dedicated to data transfers to and from the cluster. To access the node from within Milgram, run ssh transfer from the login node. To upload or download data from Milgram via the transfer node, use the hostname transfer-milgram.hpc.yale.edu (must be on VPN). More information can be found in our Transfer Data documentation . With the addition of the new transfer node, we ask that the login nodes are no longer used for data transfers to limit impact on regular login activities.","title":"Milgram Transfer Node"},{"location":"news/2023-02-milgram/","text":"Milgram Maintenance February 7, 2023 Software Updates Slurm updated to 22.05.7 NVIDIA drivers updated to 525.60.13 Apptainer updated to 1.1.4 Open OnDemand updated to 2.0.29 Hardware Updates Milgram\u2019s network was restructured to reduce latency, and improve resiliency.","title":"2023 02 milgram"},{"location":"news/2023-02-milgram/#milgram-maintenance","text":"February 7, 2023","title":"Milgram Maintenance"},{"location":"news/2023-02-milgram/#software-updates","text":"Slurm updated to 22.05.7 NVIDIA drivers updated to 525.60.13 Apptainer updated to 1.1.4 Open OnDemand updated to 2.0.29","title":"Software Updates"},{"location":"news/2023-02-milgram/#hardware-updates","text":"Milgram\u2019s network was restructured to reduce latency, and improve resiliency.","title":"Hardware Updates"},{"location":"news/2023-02/","text":"February 2023 Announcements Milgram Maintenance The biannual scheduled maintenance for the Milgram cluster will be occurring Feb 7-9. During this time, the cluster will be unavailable. See the Milgram maintenance email announcements for more details. McCleary Launch The YCRC is pleased to announce the launch of the new McCleary HPC cluster. The McCleary HPC cluster will be Yale's first direct-to-chip liquid cooled cluster, moving the YCRC and the Yale research computing community into a more environmentally friendly future. McCleary will be available in a \u201cbeta\u201d phase to Farnam and Ruddle users later on this month. Keep an eye on your email for further announcements about McCleary\u2019s availability.","title":"2023 02"},{"location":"news/2023-02/#february-2023","text":"","title":"February 2023"},{"location":"news/2023-02/#announcements","text":"","title":"Announcements"},{"location":"news/2023-02/#milgram-maintenance","text":"The biannual scheduled maintenance for the Milgram cluster will be occurring Feb 7-9. During this time, the cluster will be unavailable. See the Milgram maintenance email announcements for more details.","title":"Milgram Maintenance"},{"location":"news/2023-02/#mccleary-launch","text":"The YCRC is pleased to announce the launch of the new McCleary HPC cluster. The McCleary HPC cluster will be Yale's first direct-to-chip liquid cooled cluster, moving the YCRC and the Yale research computing community into a more environmentally friendly future. McCleary will be available in a \u201cbeta\u201d phase to Farnam and Ruddle users later on this month. Keep an eye on your email for further announcements about McCleary\u2019s availability.","title":"McCleary Launch"},{"location":"news/2023-03/","text":"March 2023 Announcements McCleary Now Available The new McCleary HPC cluster is now available for active Farnam and Ruddle users\u2013all other researchers who conduct life sciences research can request an account using our Account Request form . Farnam and Ruddle will be retired in mid-2023 so we encourage all users on those clusters to transition their work to McCleary at your earliest convenience. If you see any issues on the new cluster or have any questions, please let us know at hpc@yale.edu . Open OnDemand VSCode Available Everywhere A new OOD app code-server is now available on all YCRC clusters, including Milgram and McCleary. The new code-server allows you to run VSCode in a browser on a compute node. All users who have been running VSCode on a login node via the ssh extension should switch to code-server at their earliest convenience. Unlike VSCode on the login node, the new app also enables you to use GPUs, to allocate large memory nodes, and to specify a private partition (if applicable) The app is still in beta version and your feedback is much appreciated. Software Highlights GPU-enabled LAMMPS ( LAMMPS/23Jun2022-foss-2020b-kokkos-CUDA-11.3.1 ) is now available on Grace. AlphaFold/2.3.1-fosscuda-2020b is now available on Farnam and McCleary.","title":"2023 03"},{"location":"news/2023-03/#march-2023","text":"","title":"March 2023"},{"location":"news/2023-03/#announcements","text":"","title":"Announcements"},{"location":"news/2023-03/#mccleary-now-available","text":"The new McCleary HPC cluster is now available for active Farnam and Ruddle users\u2013all other researchers who conduct life sciences research can request an account using our Account Request form . Farnam and Ruddle will be retired in mid-2023 so we encourage all users on those clusters to transition their work to McCleary at your earliest convenience. If you see any issues on the new cluster or have any questions, please let us know at hpc@yale.edu .","title":"McCleary Now Available"},{"location":"news/2023-03/#open-ondemand-vscode-available-everywhere","text":"A new OOD app code-server is now available on all YCRC clusters, including Milgram and McCleary. The new code-server allows you to run VSCode in a browser on a compute node. All users who have been running VSCode on a login node via the ssh extension should switch to code-server at their earliest convenience. Unlike VSCode on the login node, the new app also enables you to use GPUs, to allocate large memory nodes, and to specify a private partition (if applicable) The app is still in beta version and your feedback is much appreciated.","title":"Open OnDemand VSCode Available Everywhere"},{"location":"news/2023-03/#software-highlights","text":"GPU-enabled LAMMPS ( LAMMPS/23Jun2022-foss-2020b-kokkos-CUDA-11.3.1 ) is now available on Grace. AlphaFold/2.3.1-fosscuda-2020b is now available on Farnam and McCleary.","title":"Software Highlights"},{"location":"news/2023-04/","text":"April 2023 Announcements McCleary in Production Status During March, we have been adding nodes to McCleary, including large memory nodes (4 TiB), GPU nodes and migrating most of the commons nodes from Farnam to McCleary (that are not being retired). Moreover, we have finalized the setup of McCleary and the system is now production stable. Please feel comfortable to migrate your data and workloads from Farnam and Ruddle to McCleary at your earliest convenience. New YCGA Nodes Online on McCleary McCleary now has over 3000 new cores dedicated to YCGA work! We encourage you to test your workloads and prepare to migrate from Ruddle to McCleary at your earliest convenience. More information can be found here . Software Highlights QuantumESPRESSO/7.1-intel-2020b available on Grace RELION/4.0.1 available on McCleary miniconda/23.1.0 available on all clusters scikit-learn/0.23.2-foss-2020b on Grace and McCleary seff-array updated to 0.4 on Grace, McCleary and Milgram","title":"2023 04"},{"location":"news/2023-04/#april-2023","text":"","title":"April 2023"},{"location":"news/2023-04/#announcements","text":"","title":"Announcements"},{"location":"news/2023-04/#mccleary-in-production-status","text":"During March, we have been adding nodes to McCleary, including large memory nodes (4 TiB), GPU nodes and migrating most of the commons nodes from Farnam to McCleary (that are not being retired). Moreover, we have finalized the setup of McCleary and the system is now production stable. Please feel comfortable to migrate your data and workloads from Farnam and Ruddle to McCleary at your earliest convenience.","title":"McCleary in Production Status"},{"location":"news/2023-04/#new-ycga-nodes-online-on-mccleary","text":"McCleary now has over 3000 new cores dedicated to YCGA work! We encourage you to test your workloads and prepare to migrate from Ruddle to McCleary at your earliest convenience. More information can be found here .","title":"New YCGA Nodes Online on McCleary"},{"location":"news/2023-04/#software-highlights","text":"QuantumESPRESSO/7.1-intel-2020b available on Grace RELION/4.0.1 available on McCleary miniconda/23.1.0 available on all clusters scikit-learn/0.23.2-foss-2020b on Grace and McCleary seff-array updated to 0.4 on Grace, McCleary and Milgram","title":"Software Highlights"},{"location":"news/2023-05-23/","text":"Upcoming Maintenances The McCleary cluster will be unavailable from 9am-1pm on Tuesday May 30 while maintenance is performed on the YCGA storage. The Milgram, Grace and McCleary clusters will not be available from 2pm on Monday June 19 until 10am on Wednesday June 21, due to electrical work being performed in the HPC data center. No changes will be made that impact users of the clusters. The regular Grace maintenance that had been scheduled for June 6-8 will be performed on August 15-17. This change is being made in preparation for the upgrade to RHEL 8 on Grace.","title":"2023 05 23"},{"location":"news/2023-05-23/#upcoming-maintenances","text":"The McCleary cluster will be unavailable from 9am-1pm on Tuesday May 30 while maintenance is performed on the YCGA storage. The Milgram, Grace and McCleary clusters will not be available from 2pm on Monday June 19 until 10am on Wednesday June 21, due to electrical work being performed in the HPC data center. No changes will be made that impact users of the clusters. The regular Grace maintenance that had been scheduled for June 6-8 will be performed on August 15-17. This change is being made in preparation for the upgrade to RHEL 8 on Grace.","title":"Upcoming Maintenances"},{"location":"news/2023-05/","text":"May 2023 Announcements Farnam Decommission: June 1, 2023 After many years of supporting productive science, the Farnam cluster will be decommissioned this summer as we transition to the newly deployed McCleary cluster. Logins will be disabled June 1, 2023, which will mark the official end of Farnam\u2019s service. Read-only access to Farnam\u2019s storage system (/gpfs/ysm) will be available on McCleary until July 13, 2023. All data on YSM (that you want to keep) will need to be transferred off YSM, either to non-HPC storage or to McCleary project space by you prior to YSM\u2019s retirement. Ruddle Decommission: July 1, 2023 After many years of serving YCGA, the Ruddle cluster will also be decommissioned this summer as we transition to the newly deployed McCleary cluster. Logins will be disabled July 1, 2023, which will mark the official end of Ruddle\u2019s service. We will be migrating project and sequencing directories from Ruddle to McCleary. However, you are responsible for moving home and scratch data to McCleary before July 1, 2023. Please begin to migrate your data and workloads to McCleary at your earliest convenience and reach out with any questions. McCleary Transition Reminder With our McCleary cluster now in a production stable state, we ask all Farnam users to ensure all home, project and scratch data the group wishes to keep is migrated to the new cluster ahead of the June 1st decommission. As June 1st is the formal retirement of Farnam, compute service charges on McCleary commons partitions will begin at this time. Ruddle users will have until July 1st to access the Ruddle and migrate their home and scratch data as needed. Ruddle users will NOT need to migrate their project directories; those will be automatically transferred to McCleary. As previously established on Ruddle, all jobs in the YCGA partitions will be exempt from compute service charges on the new cluster. For more information visit our McCleary Transition documentation . Software Highlights Libmamba solver for conda 23.1.0+ available on all clusters. Conda installations 23.1.0 and newer are now configured to use the faster environment solving algorithm developed by mamba by default. You can simply use conda install and enjoy the significantly faster solve times. GSEA available in McCleary and Ruddle OOD. Gene Set Enrichment Analysis (GSEA) is now available in McCleary OOD and Ruddle OOD for all users. You can access it by clicking \u201cInteractive Apps'' and then selecting \u201cGSEA\u201d. GSEA is a popular computational method to do functional analysis of multi omics data. Data files for GSEA are not centrally stored on the clusters, so you will need to download them from the GSEA website by yourself. NAG/29-GCCcore-11.2.0 available on Grace AFNI/2023.1.01-foss-2020b-Python-3.8.6 on McCleary","title":"2023 05"},{"location":"news/2023-05/#may-2023","text":"","title":"May 2023"},{"location":"news/2023-05/#announcements","text":"","title":"Announcements"},{"location":"news/2023-05/#farnam-decommission-june-1-2023","text":"After many years of supporting productive science, the Farnam cluster will be decommissioned this summer as we transition to the newly deployed McCleary cluster. Logins will be disabled June 1, 2023, which will mark the official end of Farnam\u2019s service. Read-only access to Farnam\u2019s storage system (/gpfs/ysm) will be available on McCleary until July 13, 2023. All data on YSM (that you want to keep) will need to be transferred off YSM, either to non-HPC storage or to McCleary project space by you prior to YSM\u2019s retirement.","title":"Farnam Decommission: June 1, 2023"},{"location":"news/2023-05/#ruddle-decommission-july-1-2023","text":"After many years of serving YCGA, the Ruddle cluster will also be decommissioned this summer as we transition to the newly deployed McCleary cluster. Logins will be disabled July 1, 2023, which will mark the official end of Ruddle\u2019s service. We will be migrating project and sequencing directories from Ruddle to McCleary. However, you are responsible for moving home and scratch data to McCleary before July 1, 2023. Please begin to migrate your data and workloads to McCleary at your earliest convenience and reach out with any questions.","title":"Ruddle Decommission: July 1, 2023"},{"location":"news/2023-05/#mccleary-transition-reminder","text":"With our McCleary cluster now in a production stable state, we ask all Farnam users to ensure all home, project and scratch data the group wishes to keep is migrated to the new cluster ahead of the June 1st decommission. As June 1st is the formal retirement of Farnam, compute service charges on McCleary commons partitions will begin at this time. Ruddle users will have until July 1st to access the Ruddle and migrate their home and scratch data as needed. Ruddle users will NOT need to migrate their project directories; those will be automatically transferred to McCleary. As previously established on Ruddle, all jobs in the YCGA partitions will be exempt from compute service charges on the new cluster. For more information visit our McCleary Transition documentation .","title":"McCleary Transition Reminder"},{"location":"news/2023-05/#software-highlights","text":"Libmamba solver for conda 23.1.0+ available on all clusters. Conda installations 23.1.0 and newer are now configured to use the faster environment solving algorithm developed by mamba by default. You can simply use conda install and enjoy the significantly faster solve times. GSEA available in McCleary and Ruddle OOD. Gene Set Enrichment Analysis (GSEA) is now available in McCleary OOD and Ruddle OOD for all users. You can access it by clicking \u201cInteractive Apps'' and then selecting \u201cGSEA\u201d. GSEA is a popular computational method to do functional analysis of multi omics data. Data files for GSEA are not centrally stored on the clusters, so you will need to download them from the GSEA website by yourself. NAG/29-GCCcore-11.2.0 available on Grace AFNI/2023.1.01-foss-2020b-Python-3.8.6 on McCleary","title":"Software Highlights"},{"location":"news/2023-06/","text":"June 2023 Announcements McCleary Officially Launches Today marks the official beginning of the McCleary cluster\u2019s service. In addition to compute nodes migrated from Farnam and Ruddle, McCleary features our first set of direct-to-chip liquid cooled (DLC) nodes, moving YCRC into a more environmentally friendly future. McCleary is significantly larger than the Farnam and Ruddle clusters combined. The new DLC compute nodes are able to run faster and with higher CPU density due to their superior cooling system. McCleary is named for Beatrix McCleary Hamburg, who received her medical degree in 1948 and was the first female African American graduate of Yale School of Medicine. Farnam Farewell: June 1, 2023 On the occasion of decommissioning the Farnam cluster on June 1, YCRC would like to acknowledge the profound impact Farnam has had on computing at Yale. Farnam supported biomedical computing at YSM and across the University providing compute resources to hundreds of research groups. Farnam replaced the previous biomedical cluster Louise, and began production in October 2016. Since then, it has run user jobs comprising more than 139 million compute hours. Farnam is replaced by the new cluster McCleary. Please note: Read-only access to Farnam\u2019s storage system (/gpfs/ysm) will be available on McCleary until July 13, 2023. For more information see McCleary transfer documentation . Ruddle Decommission: July 1, 2023 The Ruddle cluster will be decommissioned and access will be disabled July 1, 2023. We will be migrating project and sequencing directories from Ruddle to McCleary. Please note: Users are responsible for moving home and scratch data to McCleary prior to July 1, 2023. For more information and instructions, see our McCleary transfer documentation . Software Highlights R/4.3.0-foss-2020b+ available on all clusters. The newest version of R is now available on Grace, McCleary, and Milgram. This updates nearly 1000 packages and can be used in batch jobs and in RStudio sessions via Open OnDemand. AlphaFold/2.3.2-foss-2020b-CUDA-11.3.1 The latest version of AlphaFold (2.3.2, released in April) has been installed on McCleary and is ready for use. This version fixes a number of bugs and should improve GPU memory usage enabling longer proteins to be studied. LAMMPS/23Jun2022-foss-2020b-kokkos available on McCleary RevBayes/1.2.1-GCC-10.2.0 available on McCleary Spark 3.1.1 (CPU-only and GPU-enabled versions) available on McCleary","title":"2023 06"},{"location":"news/2023-06/#june-2023","text":"","title":"June 2023"},{"location":"news/2023-06/#announcements","text":"","title":"Announcements"},{"location":"news/2023-06/#mccleary-officially-launches","text":"Today marks the official beginning of the McCleary cluster\u2019s service. In addition to compute nodes migrated from Farnam and Ruddle, McCleary features our first set of direct-to-chip liquid cooled (DLC) nodes, moving YCRC into a more environmentally friendly future. McCleary is significantly larger than the Farnam and Ruddle clusters combined. The new DLC compute nodes are able to run faster and with higher CPU density due to their superior cooling system. McCleary is named for Beatrix McCleary Hamburg, who received her medical degree in 1948 and was the first female African American graduate of Yale School of Medicine.","title":"McCleary Officially Launches"},{"location":"news/2023-06/#farnam-farewell-june-1-2023","text":"On the occasion of decommissioning the Farnam cluster on June 1, YCRC would like to acknowledge the profound impact Farnam has had on computing at Yale. Farnam supported biomedical computing at YSM and across the University providing compute resources to hundreds of research groups. Farnam replaced the previous biomedical cluster Louise, and began production in October 2016. Since then, it has run user jobs comprising more than 139 million compute hours. Farnam is replaced by the new cluster McCleary. Please note: Read-only access to Farnam\u2019s storage system (/gpfs/ysm) will be available on McCleary until July 13, 2023. For more information see McCleary transfer documentation .","title":"Farnam Farewell: June 1, 2023"},{"location":"news/2023-06/#ruddle-decommission-july-1-2023","text":"The Ruddle cluster will be decommissioned and access will be disabled July 1, 2023. We will be migrating project and sequencing directories from Ruddle to McCleary. Please note: Users are responsible for moving home and scratch data to McCleary prior to July 1, 2023. For more information and instructions, see our McCleary transfer documentation .","title":"Ruddle Decommission: July 1, 2023"},{"location":"news/2023-06/#software-highlights","text":"R/4.3.0-foss-2020b+ available on all clusters. The newest version of R is now available on Grace, McCleary, and Milgram. This updates nearly 1000 packages and can be used in batch jobs and in RStudio sessions via Open OnDemand. AlphaFold/2.3.2-foss-2020b-CUDA-11.3.1 The latest version of AlphaFold (2.3.2, released in April) has been installed on McCleary and is ready for use. This version fixes a number of bugs and should improve GPU memory usage enabling longer proteins to be studied. LAMMPS/23Jun2022-foss-2020b-kokkos available on McCleary RevBayes/1.2.1-GCC-10.2.0 available on McCleary Spark 3.1.1 (CPU-only and GPU-enabled versions) available on McCleary","title":"Software Highlights"},{"location":"news/2023-07/","text":"July 2023 Announcements Red Hat 8 Test partitions on Grace As Red Hat Enterprise Linux (RHEL) 7 approaches its end of life, we will be upgrading the Grace cluster to RHEL8 during the August 15th-17th maintenance. This will bring Grace in line with McCleary and provide a number of key benefits: continued security patches and support beyond 2023 updated system libraries to better support modern software improved node management system to facilitate the growing number of nodes on Grace shared application tree between McCleary and Grace, which brings software parity between clusters While we have performed extensive testing, both internally and with the new McCleary cluster, we recognize that there are large numbers of custom workflows on Grace that may need to be modified to work with the new operating system. Please note: To enable debugging and testing of workflows ahead of the scheduled maintenance, we have set aside rhel8_day , rhel8_gpu , and rhel8_mpi partitions. You should access them from the rhel8_login node. Two-factor Authentication for McCleary To assure the security of the cluster and associated services, we have implemented two-factor authentication on the McCleary cluster. To simplify the transition, we have collected a set of best-practices and configurations of many of the commonly used access tools, including CyberDuck, MobaXTerm, and WinSCPon, which you can access on our docs page . If you are using other tools and experiencing issues, please contact us for assistance. New GPU Nodes on McCleary and Grace We have installed new GPU nodes for McCleary and Grace, dramatically increasing the number of GPUs available on both clusters. McCleary has 14 new nodes (56 GPUs) added to the gpu partition and six nodes (24 GPUs) added to pi_cryoem . Grace has 12 new nodes, available in the rhel8_gpu partition. Each of the new nodes contains 4 NVIDIA A5000 GPUs , with 24GB of on-board VRAM and PCIe4 connection to improve data-transport time. Software Highlights MATLAB/2023a available on all clusters Beast/2.7.4-GCC-12.2.0 available on McCleary AFNI/2023.1.07-foss-2020b available on McCleary FSL 6.0.5.1 (CPU-only and GPU-enabled versions) available on McCleary","title":"2023 07"},{"location":"news/2023-07/#july-2023","text":"","title":"July 2023"},{"location":"news/2023-07/#announcements","text":"","title":"Announcements"},{"location":"news/2023-07/#red-hat-8-test-partitions-on-grace","text":"As Red Hat Enterprise Linux (RHEL) 7 approaches its end of life, we will be upgrading the Grace cluster to RHEL8 during the August 15th-17th maintenance. This will bring Grace in line with McCleary and provide a number of key benefits: continued security patches and support beyond 2023 updated system libraries to better support modern software improved node management system to facilitate the growing number of nodes on Grace shared application tree between McCleary and Grace, which brings software parity between clusters While we have performed extensive testing, both internally and with the new McCleary cluster, we recognize that there are large numbers of custom workflows on Grace that may need to be modified to work with the new operating system. Please note: To enable debugging and testing of workflows ahead of the scheduled maintenance, we have set aside rhel8_day , rhel8_gpu , and rhel8_mpi partitions. You should access them from the rhel8_login node.","title":"Red Hat 8 Test partitions on Grace"},{"location":"news/2023-07/#two-factor-authentication-for-mccleary","text":"To assure the security of the cluster and associated services, we have implemented two-factor authentication on the McCleary cluster. To simplify the transition, we have collected a set of best-practices and configurations of many of the commonly used access tools, including CyberDuck, MobaXTerm, and WinSCPon, which you can access on our docs page . If you are using other tools and experiencing issues, please contact us for assistance.","title":"Two-factor Authentication for McCleary"},{"location":"news/2023-07/#new-gpu-nodes-on-mccleary-and-grace","text":"We have installed new GPU nodes for McCleary and Grace, dramatically increasing the number of GPUs available on both clusters. McCleary has 14 new nodes (56 GPUs) added to the gpu partition and six nodes (24 GPUs) added to pi_cryoem . Grace has 12 new nodes, available in the rhel8_gpu partition. Each of the new nodes contains 4 NVIDIA A5000 GPUs , with 24GB of on-board VRAM and PCIe4 connection to improve data-transport time.","title":"New GPU Nodes on McCleary and Grace"},{"location":"news/2023-07/#software-highlights","text":"MATLAB/2023a available on all clusters Beast/2.7.4-GCC-12.2.0 available on McCleary AFNI/2023.1.07-foss-2020b available on McCleary FSL 6.0.5.1 (CPU-only and GPU-enabled versions) available on McCleary","title":"Software Highlights"},{"location":"news/2023-08-grace/","text":"Grace Maintenance August 15-17, 2023 Software Updates Red Hat Enterprise Linux (RHEL) updated to 8.8 Slurm updated to 22.05.9 NVIDIA drivers updated to 535.86.10 Apptainer updated to 1.2.2 Open OnDemand updated to 2.0.32 Upgrade to Red Hat 8 As part of this maintenance, the operating system on Grace has been upgraded to Red Hat 8. A new unified software tree that is shared with the McCleary cluster has been created. The ssh host keys for Grace's login nodes were changed during the maintenance, which will result in a \"WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!\" error when you attempt to login. To access the cluster again, first remove the old host keys with the following command (if accessing the cluster via command line): ssh-keygen -R grace.hpc.yale.edu If you are using a GUI, such as MobaXterm, you will need to manually edit your known host file and remove the list related to Grace. For MobaXterm, this file is located (by default) in Documents/MobaXterm/home/.ssh . Then attempt a new login and accept the new host key. New Open OnDemand (Web Portal) URL The new URL for the Grace Open OnDemand web portal is https://ood-grace.ycrc.yale.edu .","title":"2023 08 grace"},{"location":"news/2023-08-grace/#grace-maintenance","text":"August 15-17, 2023","title":"Grace Maintenance"},{"location":"news/2023-08-grace/#software-updates","text":"Red Hat Enterprise Linux (RHEL) updated to 8.8 Slurm updated to 22.05.9 NVIDIA drivers updated to 535.86.10 Apptainer updated to 1.2.2 Open OnDemand updated to 2.0.32","title":"Software Updates"},{"location":"news/2023-08-grace/#upgrade-to-red-hat-8","text":"As part of this maintenance, the operating system on Grace has been upgraded to Red Hat 8. A new unified software tree that is shared with the McCleary cluster has been created. The ssh host keys for Grace's login nodes were changed during the maintenance, which will result in a \"WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!\" error when you attempt to login. To access the cluster again, first remove the old host keys with the following command (if accessing the cluster via command line): ssh-keygen -R grace.hpc.yale.edu If you are using a GUI, such as MobaXterm, you will need to manually edit your known host file and remove the list related to Grace. For MobaXterm, this file is located (by default) in Documents/MobaXterm/home/.ssh . Then attempt a new login and accept the new host key.","title":"Upgrade to Red Hat 8"},{"location":"news/2023-08-grace/#new-open-ondemand-web-portal-url","text":"The new URL for the Grace Open OnDemand web portal is https://ood-grace.ycrc.yale.edu .","title":"New Open OnDemand (Web Portal) URL"},{"location":"news/2023-08-milgram/","text":"Milgram Maintenance August 22, 2023_ Software Updates Slurm updated to 22.05.9 NVIDIA drivers updated to 535.86.10 Apptainer updated to 1.2.42 Open OnDemand updated to 2.0.32 Multi-Factor Authentication Multi-factor authentication is now required for ssh for all users on Milgram. For most usage, this additional step is minimally invasive and makes our clusters much more secure. However, for users who use graphical transfer tools such as Cyberduck, please see our MFA transfer documentation .","title":"2023 08 milgram"},{"location":"news/2023-08-milgram/#milgram-maintenance","text":"August 22, 2023_","title":"Milgram Maintenance"},{"location":"news/2023-08-milgram/#software-updates","text":"Slurm updated to 22.05.9 NVIDIA drivers updated to 535.86.10 Apptainer updated to 1.2.42 Open OnDemand updated to 2.0.32","title":"Software Updates"},{"location":"news/2023-08-milgram/#multi-factor-authentication","text":"Multi-factor authentication is now required for ssh for all users on Milgram. For most usage, this additional step is minimally invasive and makes our clusters much more secure. However, for users who use graphical transfer tools such as Cyberduck, please see our MFA transfer documentation .","title":"Multi-Factor Authentication"},{"location":"news/2023-08/","text":"August 2023 Announcements Ruddle Farewell: July 24, 2023 On the occasion of decommissioning the Ruddle cluster on July 24, the Yale Center for Genome Analysis (YCGA) and the Yale Center for Research Computing (YCRC) would like to acknowledge the profound impact Ruddle has had on computing at Yale. Ruddle provided the compute resources for YCGA's high throughput sequencing and supported genomic computing for hundreds of research groups at YSM and across the University. In February 2016, Ruddle replaced the previous biomedical cluster BulldogN. Since then, it has run more than 24 million user jobs comprising more than 73 million compute hours. Funding for Ruddle came from NIH grant 1S10OD018521-01, with Shrikant Mane as PI. Ruddle is replaced by a dedicated partition and storage on the new McCleary cluster, which were funded by NIH grant 1S10OD030363-01A1, also awarded to Dr. Mane. Upcoming Grace Maintenance: August 15-17, 2023 Scheduled maintenance will be performed on the Grace cluster starting on Tuesday, August 15, 2023, at 8:00 am. Maintenance is expected to be completed by the end of day, Thursday, August 17, 2023. Upcoming Milgram Maintenance: August 22-24, 2023 Scheduled maintenance will be performed on the Milgram cluster starting on Tuesday, August 22, 2023, at 8:00 am. Maintenance is expected to be completed by the end of day, Thursday, August 24, 2023. Grace Operating System Upgrade As Red Hat Enterprise Linux (RHEL) 7 approaches its end of life, we will be upgrading the Grace cluster from RHEL7 to RHEL8 during the August maintenance window. This will bring Grace in line with McCleary and provide a number of key benefits: continued security patches and support beyond 2023 updated system libraries to better support modern software improved node management system to facilitate the growing number of nodes on Grace shared application tree between McCleary and Grace, which brings software parity between clusters Three test partitions are available ( rhel8_day , rhel8_gpu , and rhel8_mpi ) for use in debugging workflows before the upgrade. These partitions should be accessed from the rhel8_login node. Software Highlights Julia/1.9.2-linux-x86_64 available on Grace Kraken2/2.1.3-gompi-2020b available on McCleary QuantumESPRESSO/7.0-intel-2020b available on Grace","title":"2023 08"},{"location":"news/2023-08/#august-2023","text":"","title":"August 2023"},{"location":"news/2023-08/#announcements","text":"","title":"Announcements"},{"location":"news/2023-08/#ruddle-farewell-july-24-2023","text":"On the occasion of decommissioning the Ruddle cluster on July 24, the Yale Center for Genome Analysis (YCGA) and the Yale Center for Research Computing (YCRC) would like to acknowledge the profound impact Ruddle has had on computing at Yale. Ruddle provided the compute resources for YCGA's high throughput sequencing and supported genomic computing for hundreds of research groups at YSM and across the University. In February 2016, Ruddle replaced the previous biomedical cluster BulldogN. Since then, it has run more than 24 million user jobs comprising more than 73 million compute hours. Funding for Ruddle came from NIH grant 1S10OD018521-01, with Shrikant Mane as PI. Ruddle is replaced by a dedicated partition and storage on the new McCleary cluster, which were funded by NIH grant 1S10OD030363-01A1, also awarded to Dr. Mane.","title":"Ruddle Farewell: July 24, 2023"},{"location":"news/2023-08/#upcoming-grace-maintenance-august-15-17-2023","text":"Scheduled maintenance will be performed on the Grace cluster starting on Tuesday, August 15, 2023, at 8:00 am. Maintenance is expected to be completed by the end of day, Thursday, August 17, 2023.","title":"Upcoming Grace Maintenance: August 15-17, 2023"},{"location":"news/2023-08/#upcoming-milgram-maintenance-august-22-24-2023","text":"Scheduled maintenance will be performed on the Milgram cluster starting on Tuesday, August 22, 2023, at 8:00 am. Maintenance is expected to be completed by the end of day, Thursday, August 24, 2023.","title":"Upcoming Milgram Maintenance: August 22-24, 2023"},{"location":"news/2023-08/#grace-operating-system-upgrade","text":"As Red Hat Enterprise Linux (RHEL) 7 approaches its end of life, we will be upgrading the Grace cluster from RHEL7 to RHEL8 during the August maintenance window. This will bring Grace in line with McCleary and provide a number of key benefits: continued security patches and support beyond 2023 updated system libraries to better support modern software improved node management system to facilitate the growing number of nodes on Grace shared application tree between McCleary and Grace, which brings software parity between clusters Three test partitions are available ( rhel8_day , rhel8_gpu , and rhel8_mpi ) for use in debugging workflows before the upgrade. These partitions should be accessed from the rhel8_login node.","title":"Grace Operating System Upgrade"},{"location":"news/2023-08/#software-highlights","text":"Julia/1.9.2-linux-x86_64 available on Grace Kraken2/2.1.3-gompi-2020b available on McCleary QuantumESPRESSO/7.0-intel-2020b available on Grace","title":"Software Highlights"},{"location":"news/2023-09/","text":"September 2023 Announcements Grace RHEL8 Upgrade As Red Hat Enterprise Linux (RHEL) 7 approaches its end of life, we upgraded the Grace cluster from RHEL7 to RHEL8 during the August maintenance window. This brings Grace in line with McCleary and provide a number of key benefits: continued security patches and support beyond 2023 updated system libraries to better support modern software improved node management system to facilitate the growing number of nodes on Grace shared application tree between McCleary and Grace, which brings software parity between clusters There are a small number of compute nodes in the legacy partition with the old RHEL7 operating system installed for workloads that still need to be migrated. We expect to retire this partition during the Grace December 2023 maintenance. Please contact us if you need help upgrading to RHEL8 in the coming months. Grace Old Software Deprecation The RHEL7 application module tree ( /gpfs/loomis/apps/avx ) is now deprecated and will be removed from the default module environment during the Grace December maintenance. The software will still be available on Grace, but YCRC will no longer provide support for those old packages after December. If you are using a software package in that tree that is not yet installed into the new shared module tree, please let us know as soon as possible so we can help avoid any disruptions. Software Highlights intel/2022b toolchain is now available on Grace and McCleary MKL 2022.2.1 Intel MPI 2022.2.1 Intel Compilers 2022.2.1 foss/2022b toolchain is now available on Grace and McCleary FFTW 3.3.10 ScaLAPACK 2.2.0 OpenMPI 4.1.4 GCC 12.2.0","title":"2023 09"},{"location":"news/2023-09/#september-2023","text":"","title":"September 2023"},{"location":"news/2023-09/#announcements","text":"","title":"Announcements"},{"location":"news/2023-09/#grace-rhel8-upgrade","text":"As Red Hat Enterprise Linux (RHEL) 7 approaches its end of life, we upgraded the Grace cluster from RHEL7 to RHEL8 during the August maintenance window. This brings Grace in line with McCleary and provide a number of key benefits: continued security patches and support beyond 2023 updated system libraries to better support modern software improved node management system to facilitate the growing number of nodes on Grace shared application tree between McCleary and Grace, which brings software parity between clusters There are a small number of compute nodes in the legacy partition with the old RHEL7 operating system installed for workloads that still need to be migrated. We expect to retire this partition during the Grace December 2023 maintenance. Please contact us if you need help upgrading to RHEL8 in the coming months.","title":"Grace RHEL8 Upgrade"},{"location":"news/2023-09/#grace-old-software-deprecation","text":"The RHEL7 application module tree ( /gpfs/loomis/apps/avx ) is now deprecated and will be removed from the default module environment during the Grace December maintenance. The software will still be available on Grace, but YCRC will no longer provide support for those old packages after December. If you are using a software package in that tree that is not yet installed into the new shared module tree, please let us know as soon as possible so we can help avoid any disruptions.","title":"Grace Old Software Deprecation"},{"location":"news/2023-09/#software-highlights","text":"intel/2022b toolchain is now available on Grace and McCleary MKL 2022.2.1 Intel MPI 2022.2.1 Intel Compilers 2022.2.1 foss/2022b toolchain is now available on Grace and McCleary FFTW 3.3.10 ScaLAPACK 2.2.0 OpenMPI 4.1.4 GCC 12.2.0","title":"Software Highlights"},{"location":"news/2023-10-mccleary/","text":"McCleary Maintenance October 3-5, 2023_ Software Updates Slurm updated to 23.02.5 NVIDIA drivers updated to 535.104.12 Lmod updated to 8.7.30 Apptainer updated to 1.2.3 System Python updated to 3.11","title":"2023 10 mccleary"},{"location":"news/2023-10-mccleary/#mccleary-maintenance","text":"October 3-5, 2023_","title":"McCleary Maintenance"},{"location":"news/2023-10-mccleary/#software-updates","text":"Slurm updated to 23.02.5 NVIDIA drivers updated to 535.104.12 Lmod updated to 8.7.30 Apptainer updated to 1.2.3 System Python updated to 3.11","title":"Software Updates"},{"location":"news/2023-10/","text":"October 2023 Announcements McCleary Maintenance The biannual scheduled maintenance for the McCleary cluster will be occurring Oct 3-5. During this time, the cluster will be unavailable. See the McCleary maintenance email announcements for more details. Interactive jobs on day on McCleary Interactive jobs are now allowed to be run on the day partition on McCleary. Note you are still limited to 4 interactive-style jobs of any kind (salloc or OpenOnDemand) at one time. Additional instances will be rejected until you delete older open instances. For OnDemand jobs, closing the window does not terminate the interactive app job. To terminate the job, click the \"Delete\" button in your \"My Interactive Apps\" page in the web portal. \"Papermill\" for Jupyter Command-Line Execution Many scientific workflows start as interactive Jupyter notebooks, and our Open OnDemand portal has dramatically simplified deploying these notebooks on cluster resources. However, the step from running notebooks interactively to running jobs as a batch script can be challenging and is often a barrier to migrating to using sbatch to run workflows non-interactively. To help solve this problem, there are a handful of utilities that can execute a notebook as if you were manually hitting \"shift-Enter\" for each cell. Of note is Papermill which provides a powerful set of tools to bridge between interactive and batch-mode computing. To get started, install papermill into your Conda environments: module load miniconda conda install papermill Then you can simply evaluate a notebook, preserving figures and output inside the notebook, like this: papermill /path/to/notebook.ipynb This can be run inside a batch job that might look like this: #!/bin/bash #SBATCH -p day #SBATCH -c 1 #SBATCH -t 6:00:00 module purge miniconda conda activate my_env papermill /path/to/notebook.ipynb Variables can also be parameterized and passed in as command-line options so that you can run multiple copies simultaneously with different input variables. For more information see the [Papermill docs pages](https://papermill.readthedocs.io/.","title":"2023 10"},{"location":"news/2023-10/#october-2023","text":"","title":"October 2023"},{"location":"news/2023-10/#announcements","text":"","title":"Announcements"},{"location":"news/2023-10/#mccleary-maintenance","text":"The biannual scheduled maintenance for the McCleary cluster will be occurring Oct 3-5. During this time, the cluster will be unavailable. See the McCleary maintenance email announcements for more details.","title":"McCleary Maintenance"},{"location":"news/2023-10/#interactive-jobs-on-day-on-mccleary","text":"Interactive jobs are now allowed to be run on the day partition on McCleary. Note you are still limited to 4 interactive-style jobs of any kind (salloc or OpenOnDemand) at one time. Additional instances will be rejected until you delete older open instances. For OnDemand jobs, closing the window does not terminate the interactive app job. To terminate the job, click the \"Delete\" button in your \"My Interactive Apps\" page in the web portal.","title":"Interactive jobs on day on McCleary"},{"location":"news/2023-10/#papermill-for-jupyter-command-line-execution","text":"Many scientific workflows start as interactive Jupyter notebooks, and our Open OnDemand portal has dramatically simplified deploying these notebooks on cluster resources. However, the step from running notebooks interactively to running jobs as a batch script can be challenging and is often a barrier to migrating to using sbatch to run workflows non-interactively. To help solve this problem, there are a handful of utilities that can execute a notebook as if you were manually hitting \"shift-Enter\" for each cell. Of note is Papermill which provides a powerful set of tools to bridge between interactive and batch-mode computing. To get started, install papermill into your Conda environments: module load miniconda conda install papermill Then you can simply evaluate a notebook, preserving figures and output inside the notebook, like this: papermill /path/to/notebook.ipynb This can be run inside a batch job that might look like this: #!/bin/bash #SBATCH -p day #SBATCH -c 1 #SBATCH -t 6:00:00 module purge miniconda conda activate my_env papermill /path/to/notebook.ipynb Variables can also be parameterized and passed in as command-line options so that you can run multiple copies simultaneously with different input variables. For more information see the [Papermill docs pages](https://papermill.readthedocs.io/.","title":"\"Papermill\" for Jupyter Command-Line Execution"},{"location":"news/2023-11/","text":"November 2023 Announcements Globus Available on Milgram Globus is now available to move data in and out from Milgram. For increased security, Globus only has access to a staging directory ( /gpfs/milgram/globus/$NETID ) where you can temporarily store data. Please see our documentation page for more information and reach out to hpc@yale.edu if you have any questions. RStudio Server Updates RStudio Server on the OpenDemand web portal for all clusters now starts an R session in a clean environment and will not save the session when you finish. If you want to save your session and reuse it next time, please select the checkbox \"Start R from your last saved session\".","title":"2023 11"},{"location":"news/2023-11/#november-2023","text":"","title":"November 2023"},{"location":"news/2023-11/#announcements","text":"","title":"Announcements"},{"location":"news/2023-11/#globus-available-on-milgram","text":"Globus is now available to move data in and out from Milgram. For increased security, Globus only has access to a staging directory ( /gpfs/milgram/globus/$NETID ) where you can temporarily store data. Please see our documentation page for more information and reach out to hpc@yale.edu if you have any questions.","title":"Globus Available on Milgram"},{"location":"news/2023-11/#rstudio-server-updates","text":"RStudio Server on the OpenDemand web portal for all clusters now starts an R session in a clean environment and will not save the session when you finish. If you want to save your session and reuse it next time, please select the checkbox \"Start R from your last saved session\".","title":"RStudio Server Updates"},{"location":"resources/","text":"Training & Other Resources The YCRC offers training sessions in a wide range of topics related to research computing taught by YCRC staff, HPC experts at national HPC centers or our vendor partners.","title":"Overview"},{"location":"resources/#training-other-resources","text":"The YCRC offers training sessions in a wide range of topics related to research computing taught by YCRC staff, HPC experts at national HPC centers or our vendor partners.","title":"Training & Other Resources"},{"location":"resources/glossary/","text":"Glossary To help clarify the way we refer to certain terms in our user documentation, here is a brief list of some of the words that regularly come up in our documents. Please reach out to us at hpc@yale.edu if there are any other words that need to be added. Account - used to authenticate and grant permission to access resources Account (Slurm) - an accounting mechanism to keep track of a group's computing usage Activate - making something operational Array - a data structure across a series of memory locations consisting of elements organized in an index Array (job) - a series of jobs that all request the same resources and run the same batch script Array Task ID - a unique sequential number with an appended number that refers to an individual task within the set of submitted jobs Channel - Community-led collections of packages created by a group or lab installed with conda to allow for a homogenous environment across systems CLI - Command Line Interface processes commands to a computer program in the form of lines of text in a window Cluster - a set of computers, called nodes) networked together so nodes can perform the tasks facilitated by a scheduling software Command - a specific order from a computer to execute a service with either an application or the operating system Compute Node - the nodes that work runs on to perform computational work Container - A stack of software, libraries and operating system that is independent of the host computer and can be accessed on other computers Container Image - Self-contained read-only files used to run applications CPU - Central Processing Units are the components of a system that perform basic operations and exchange data with the system\u2019s memory (also known as a processor) Data - items of information collected together for reference or analysis Database - a collection of structured data held within a computer Deactivate - making something de-operational Environment - a collection of hardware, software, data storage and networks that work together in facilitating the processing and exchange of information Extension - Suffix at the end of a filename to indicate the file type Fileset - a section of a storage device that is given a designated purpose Filesystem - a process that manages how and where data is stored Flag - (see Options) GPU - Graphics Processing Units are specialized circuits designed to rapidly manipulate memory and create images in a frame buffer for a displayed output GridFTP - an extension of the Fire Transfer Protocol for grid computing that allows users to transfer and save data on a different account such as Google Drive or other off network memory Group - a collection of users who can all be given the same permissions on a system GUI - Graphical User Interface allows users to interact with devices and applications through a visual window that can commonly display icons and predetermined fields Hardware - the physical parts of a computer Host - (ie. Host Computer) A device connected to a computer network that offers resources, services and applications to users on the network Image - (See Container Image) Index - a method of sorting data by creating keywords or a listing of data Interface - a boundary across which two or more computer system components can exchange information Job - a unit of work given to an operating system by a scheduler Job Array - a way to submit multiple similar jobs by associating each subjob with an index value based on an array task id Key - a variable value applied using an algorithm to a block of unencrypted text to produce encrypted text Load - transfer a program or data into memory or into the CPU Login Node - a node that users log in on to access the cluster Memory - (see RAM) Metadata - A set of data that describes and gives basic information about other data Module - a number of distinct but interrelated units that build up or into a program MPI - Message Passing Interface is a standardized and portable message-passing standard designed to function on parallel computing architectures Multiprocessing - the ability to operate more than one task simultaneously on the same program across two or more processors in a computer Node - a server in the cluster Option - a single letter or full word that modifies the behavior of a command in a predetermined way (also known as a flag or switch) Package - a collection of hardware and software needed to create a working system Parallel - (ex. Computing/Programming) Architecture in which several processes are carried out simultaneously across smaller, independent parts Partition - a section of a storage device that is given a designated purpose Partition (Slurm) - a collection of compute nodes available via the scheduler Path - A string of characters used to identify locations throughout a directory structure Pane - (Associate with window) A subdivision within a window where an independent terminal can run simultaneously alongside another terminal Processor - (see CPU) Queue - a sequence of objects arranged according to priority waiting to be processed RAM - Random Access Memory, also known as \"Memory\" can be read and changed in any order and is typically used to to store working data Reproducibility - the ability to execute the same results across multiple systems by different individuals using the same data Scheduler - the software used to assign resources to a job for tasks Scheduling - the act of assigning resources to a task through a software product Session - a temporary information exchange between two or more devices SSH - secure shell is a cryptographic network protocol for operating network services securely over an unsecured network Software - a collection of data and instructions that tell a computer how to operate Switch - (see Options) System - a set of integrated hardware and software that input, output, process, and store data and information Task ID - a unique sequential number used to refer to a task Terminal - Referring to a terminal program, a text-based interface for typing commands Toolchain - a set of tools performing individual actions used in delivering an operation Unload - remove a program or data from memory or out of the CPU User - a person interacting and utilizing a computing service Variable - assigned and referenced data values that can be called within a program and changed depending on how the program runs Window - (Associate with pane) the whole screen being displayed, containing subdivisions, or panes, that can run independent terminals alongside each other","title":"Glossary"},{"location":"resources/glossary/#glossary","text":"To help clarify the way we refer to certain terms in our user documentation, here is a brief list of some of the words that regularly come up in our documents. Please reach out to us at hpc@yale.edu if there are any other words that need to be added. Account - used to authenticate and grant permission to access resources Account (Slurm) - an accounting mechanism to keep track of a group's computing usage Activate - making something operational Array - a data structure across a series of memory locations consisting of elements organized in an index Array (job) - a series of jobs that all request the same resources and run the same batch script Array Task ID - a unique sequential number with an appended number that refers to an individual task within the set of submitted jobs Channel - Community-led collections of packages created by a group or lab installed with conda to allow for a homogenous environment across systems CLI - Command Line Interface processes commands to a computer program in the form of lines of text in a window Cluster - a set of computers, called nodes) networked together so nodes can perform the tasks facilitated by a scheduling software Command - a specific order from a computer to execute a service with either an application or the operating system Compute Node - the nodes that work runs on to perform computational work Container - A stack of software, libraries and operating system that is independent of the host computer and can be accessed on other computers Container Image - Self-contained read-only files used to run applications CPU - Central Processing Units are the components of a system that perform basic operations and exchange data with the system\u2019s memory (also known as a processor) Data - items of information collected together for reference or analysis Database - a collection of structured data held within a computer Deactivate - making something de-operational Environment - a collection of hardware, software, data storage and networks that work together in facilitating the processing and exchange of information Extension - Suffix at the end of a filename to indicate the file type Fileset - a section of a storage device that is given a designated purpose Filesystem - a process that manages how and where data is stored Flag - (see Options) GPU - Graphics Processing Units are specialized circuits designed to rapidly manipulate memory and create images in a frame buffer for a displayed output GridFTP - an extension of the Fire Transfer Protocol for grid computing that allows users to transfer and save data on a different account such as Google Drive or other off network memory Group - a collection of users who can all be given the same permissions on a system GUI - Graphical User Interface allows users to interact with devices and applications through a visual window that can commonly display icons and predetermined fields Hardware - the physical parts of a computer Host - (ie. Host Computer) A device connected to a computer network that offers resources, services and applications to users on the network Image - (See Container Image) Index - a method of sorting data by creating keywords or a listing of data Interface - a boundary across which two or more computer system components can exchange information Job - a unit of work given to an operating system by a scheduler Job Array - a way to submit multiple similar jobs by associating each subjob with an index value based on an array task id Key - a variable value applied using an algorithm to a block of unencrypted text to produce encrypted text Load - transfer a program or data into memory or into the CPU Login Node - a node that users log in on to access the cluster Memory - (see RAM) Metadata - A set of data that describes and gives basic information about other data Module - a number of distinct but interrelated units that build up or into a program MPI - Message Passing Interface is a standardized and portable message-passing standard designed to function on parallel computing architectures Multiprocessing - the ability to operate more than one task simultaneously on the same program across two or more processors in a computer Node - a server in the cluster Option - a single letter or full word that modifies the behavior of a command in a predetermined way (also known as a flag or switch) Package - a collection of hardware and software needed to create a working system Parallel - (ex. Computing/Programming) Architecture in which several processes are carried out simultaneously across smaller, independent parts Partition - a section of a storage device that is given a designated purpose Partition (Slurm) - a collection of compute nodes available via the scheduler Path - A string of characters used to identify locations throughout a directory structure Pane - (Associate with window) A subdivision within a window where an independent terminal can run simultaneously alongside another terminal Processor - (see CPU) Queue - a sequence of objects arranged according to priority waiting to be processed RAM - Random Access Memory, also known as \"Memory\" can be read and changed in any order and is typically used to to store working data Reproducibility - the ability to execute the same results across multiple systems by different individuals using the same data Scheduler - the software used to assign resources to a job for tasks Scheduling - the act of assigning resources to a task through a software product Session - a temporary information exchange between two or more devices SSH - secure shell is a cryptographic network protocol for operating network services securely over an unsecured network Software - a collection of data and instructions that tell a computer how to operate Switch - (see Options) System - a set of integrated hardware and software that input, output, process, and store data and information Task ID - a unique sequential number used to refer to a task Terminal - Referring to a terminal program, a text-based interface for typing commands Toolchain - a set of tools performing individual actions used in delivering an operation Unload - remove a program or data from memory or out of the CPU User - a person interacting and utilizing a computing service Variable - assigned and referenced data values that can be called within a program and changed depending on how the program runs Window - (Associate with pane) the whole screen being displayed, containing subdivisions, or panes, that can run independent terminals alongside each other","title":"Glossary"},{"location":"resources/intro_to_hpc_tutorial/","text":"Introduction to HPC Tutorials To begin, access the cluster through Open OnDemand and open the shell window. This can be done by by going to the top navigation bar, clicking on the Clusters tab and selecting the Shell Access button. Once the new shell window is loaded, you will be able use this interface like your local command interface. Now that you're setup in a shell window, you can begin the first task like so: Part 1: Interactive Jobs Inside of the shell window, start an interactive job with the default resource requests. Once you are allocated space off the login node, load the Miniconda module and create a Conda environment for this exercise. This can be done like so: # Ask for an interactive session salloc # Load the Miniconda module module load miniconda # Create a test environment with Conda that contains the default Python version conda create -yn tutorial_env python jupyter # Activate the new environment conda activate tutorial_env # Deactivate the new environment conda deactivate # Exit your interactive job to free the resources exit Part 2: Batch Jobs Going off of the environment we created in part 1 , navigate to the Files tab in OOD and select your project directory. Click the '+ New File' button and name the file message_decode_tutorial.py . Once the new file is created, open this file in the OOD text editor by going to the file, clicking the three-dot more button, and selecting edit in the dropdown menu like so: Once the text editor is open, paste this python example inside of the file: def message_decode_tutorial ( message , c ): holder = \"\" for letter in range ( 0 , len ( message )): if ( letter + 1 ) % c == 0 : holder = holder + message [ letter ] return holder message = 'gT baZu lWp Kjv uXyeS nViU fdlH gJr KaIc tBpl Sy \\ Jox MtUl Qbm kGTp UdHe hdLJf Nu IcPRu XhBtDjf TsmPf \\ o DoKfw xP qyTcJ tUpYrv Pk ArBCf Wrtp JfRcX JqPdKLC' cypher = message_decode_tutorial ( message , 10 ) with open ( '/home/NETID/decoded_example.txt' , 'w+' ) as output : print ( cypher , file = output ) This python function takes a given message and parses through it against the parameters of a cypher, which in our case writes every 10th letter. Make sure to replace the placeholder 'NETID' in the second to last line with your personal NetID. This will allow your output file to go into your homespace. From here, navigate back to your project directory and select the '+ New File' button, this time naming it batch_tutorial.sh . Using Slurm options to define resource requests for this job, paste the following code inside of this file like you did the previous file: #!/bin/bash #SBATCH --job-name=message_decode_tutorial #SBATCH --time=1:00 #SBATCH --mem-per-cpu=2MB #SBATCH --mail-type=ALL module load miniconda source activate tutorial_env python message_decode_tutorial.py Because the partition isn't specified for this job, it will run on the cluster's default partition. From there, you can go back to the shell window, navigate to your project directory and run the sbatch command to begin your batch job like so: # Navigate to the project directory cd project # Use Slurm to start a batch job sbatch batch_tutorial.sh Once you receive an email saying the job is complete, navigate to your home-space through the shell window on Open OnDemand. Within this directory you will find a file called decoded_example.txt . To quickly see the file contents, use the cat command to print the file's contents on the standard output, revealing the decoded message like so: # Navigate to your homespace (replacing NETID with your netID) cd /home/NETID # Print out the decoded message cat decoded_example.txt Part 3: Interactive Apps on OOD Now that you have completed both an interactive and batch job, try using Jupyter Notebooks on Open OnDemand for your work. This can be done in the shell window like so: # Purge any loaded modules module purge # Build your environment dropdown tab on OOD ycrc_conda_env.sh update Now that this is completed, return to the Open OnDemand homepage and select the Interactive Apps dropdown tab in the top navigation bar. From there you can select Jupyter and load the job submission request form. To select your resources, make sure to consult our Slurm documentation as well as the specific cluster's partition information to ensure you're selecting the appropriate resources for your job's needs. Once the session is submitted and running, connect to the notebook and navigate to your working directory. From there you can either select the Upload button to upload an existing Jupyter notebook file or select the New button to create a new notebook. To help with this, make sure to look over the YCRC Jupyter Notebook information as well as Jupyter's User Interface page .","title":"Introduction to HPC Tutorials"},{"location":"resources/intro_to_hpc_tutorial/#introduction-to-hpc-tutorials","text":"To begin, access the cluster through Open OnDemand and open the shell window. This can be done by by going to the top navigation bar, clicking on the Clusters tab and selecting the Shell Access button. Once the new shell window is loaded, you will be able use this interface like your local command interface. Now that you're setup in a shell window, you can begin the first task like so:","title":"Introduction to HPC Tutorials"},{"location":"resources/intro_to_hpc_tutorial/#part-1-interactive-jobs","text":"Inside of the shell window, start an interactive job with the default resource requests. Once you are allocated space off the login node, load the Miniconda module and create a Conda environment for this exercise. This can be done like so: # Ask for an interactive session salloc # Load the Miniconda module module load miniconda # Create a test environment with Conda that contains the default Python version conda create -yn tutorial_env python jupyter # Activate the new environment conda activate tutorial_env # Deactivate the new environment conda deactivate # Exit your interactive job to free the resources exit","title":"Part 1: Interactive Jobs"},{"location":"resources/intro_to_hpc_tutorial/#part-2-batch-jobs","text":"Going off of the environment we created in part 1 , navigate to the Files tab in OOD and select your project directory. Click the '+ New File' button and name the file message_decode_tutorial.py . Once the new file is created, open this file in the OOD text editor by going to the file, clicking the three-dot more button, and selecting edit in the dropdown menu like so: Once the text editor is open, paste this python example inside of the file: def message_decode_tutorial ( message , c ): holder = \"\" for letter in range ( 0 , len ( message )): if ( letter + 1 ) % c == 0 : holder = holder + message [ letter ] return holder message = 'gT baZu lWp Kjv uXyeS nViU fdlH gJr KaIc tBpl Sy \\ Jox MtUl Qbm kGTp UdHe hdLJf Nu IcPRu XhBtDjf TsmPf \\ o DoKfw xP qyTcJ tUpYrv Pk ArBCf Wrtp JfRcX JqPdKLC' cypher = message_decode_tutorial ( message , 10 ) with open ( '/home/NETID/decoded_example.txt' , 'w+' ) as output : print ( cypher , file = output ) This python function takes a given message and parses through it against the parameters of a cypher, which in our case writes every 10th letter. Make sure to replace the placeholder 'NETID' in the second to last line with your personal NetID. This will allow your output file to go into your homespace. From here, navigate back to your project directory and select the '+ New File' button, this time naming it batch_tutorial.sh . Using Slurm options to define resource requests for this job, paste the following code inside of this file like you did the previous file: #!/bin/bash #SBATCH --job-name=message_decode_tutorial #SBATCH --time=1:00 #SBATCH --mem-per-cpu=2MB #SBATCH --mail-type=ALL module load miniconda source activate tutorial_env python message_decode_tutorial.py Because the partition isn't specified for this job, it will run on the cluster's default partition. From there, you can go back to the shell window, navigate to your project directory and run the sbatch command to begin your batch job like so: # Navigate to the project directory cd project # Use Slurm to start a batch job sbatch batch_tutorial.sh Once you receive an email saying the job is complete, navigate to your home-space through the shell window on Open OnDemand. Within this directory you will find a file called decoded_example.txt . To quickly see the file contents, use the cat command to print the file's contents on the standard output, revealing the decoded message like so: # Navigate to your homespace (replacing NETID with your netID) cd /home/NETID # Print out the decoded message cat decoded_example.txt","title":"Part 2: Batch Jobs"},{"location":"resources/intro_to_hpc_tutorial/#part-3-interactive-apps-on-ood","text":"Now that you have completed both an interactive and batch job, try using Jupyter Notebooks on Open OnDemand for your work. This can be done in the shell window like so: # Purge any loaded modules module purge # Build your environment dropdown tab on OOD ycrc_conda_env.sh update Now that this is completed, return to the Open OnDemand homepage and select the Interactive Apps dropdown tab in the top navigation bar. From there you can select Jupyter and load the job submission request form. To select your resources, make sure to consult our Slurm documentation as well as the specific cluster's partition information to ensure you're selecting the appropriate resources for your job's needs. Once the session is submitted and running, connect to the notebook and navigate to your working directory. From there you can either select the Upload button to upload an existing Jupyter notebook file or select the New button to create a new notebook. To help with this, make sure to look over the YCRC Jupyter Notebook information as well as Jupyter's User Interface page .","title":"Part 3: Interactive Apps on OOD"},{"location":"resources/national-hpcs/","text":"National HPCs Beyond Yale\u2019s on campus clusters, there are a number of ways for researchers to obtain compute resources (both cycles and storage) at national facilities. Yale researchers may use the Data Management Planning Tool ( DMPtool ) to create, review, and share data management plans that are in accordance with institutional and funder requirements. ACCESS (formerly XSEDE) Quarterly | Application & Info \"Explore Allocations\" are readily available on ACCESS resources for benchmarking and planning runs. For even lower commitment allocations (e.g. to just explore the resource), YCRC staff members have \"Campus Champions\" allocations on all ACCESS resources that can be shared upon request. Contact us for access. ACCESS resources include the following. Up to date information is available at access-ci.org : Stampede2: traditional compute and Phis Jetstream: Science Gateways Bridges2: traditional compute and GPUs Comet: traditional compute and GPUs XStream: GPU cluster Department of Energy NERSC, Argonne Leadership Computing Facility (ALCF), Oak Ridge Leadership Computing Facility (OLCF) INCITE Due in June | Application & Info ALCC Due in June | Application & Info ANL Director\u2019s Discretionary Rolling submission | Application & Info 3-6 month duration. Expectation is that you are using it to gather data for ALCC or INCITE proposal OLCF Director\u2019s Discretionary Rolling submission | Application & Info NCSA: Blue Waters PRAC Due in November | Application & Info Blue Water\u2019s Innovation Allocations Rolling submission | Application & Info Open Science Grid (OSG) Rolling Submission | Application & Info The OSG facilitates access to distributed high throughput computing for research in the US. The resources accessible through the OSG are contributed by the community, organized by the OSG, and governed by the OSG consortium.","title":"National HPCs"},{"location":"resources/national-hpcs/#national-hpcs","text":"Beyond Yale\u2019s on campus clusters, there are a number of ways for researchers to obtain compute resources (both cycles and storage) at national facilities. Yale researchers may use the Data Management Planning Tool ( DMPtool ) to create, review, and share data management plans that are in accordance with institutional and funder requirements.","title":"National HPCs"},{"location":"resources/national-hpcs/#access-formerly-xsede","text":"Quarterly | Application & Info \"Explore Allocations\" are readily available on ACCESS resources for benchmarking and planning runs. For even lower commitment allocations (e.g. to just explore the resource), YCRC staff members have \"Campus Champions\" allocations on all ACCESS resources that can be shared upon request. Contact us for access. ACCESS resources include the following. Up to date information is available at access-ci.org : Stampede2: traditional compute and Phis Jetstream: Science Gateways Bridges2: traditional compute and GPUs Comet: traditional compute and GPUs XStream: GPU cluster","title":"ACCESS (formerly XSEDE)"},{"location":"resources/national-hpcs/#department-of-energy","text":"NERSC, Argonne Leadership Computing Facility (ALCF), Oak Ridge Leadership Computing Facility (OLCF)","title":"Department of Energy"},{"location":"resources/national-hpcs/#incite","text":"Due in June | Application & Info","title":"INCITE"},{"location":"resources/national-hpcs/#alcc","text":"Due in June | Application & Info","title":"ALCC"},{"location":"resources/national-hpcs/#anl-directors-discretionary","text":"Rolling submission | Application & Info 3-6 month duration. Expectation is that you are using it to gather data for ALCC or INCITE proposal","title":"ANL Director\u2019s Discretionary"},{"location":"resources/national-hpcs/#olcf-directors-discretionary","text":"Rolling submission | Application & Info","title":"OLCF Director\u2019s Discretionary"},{"location":"resources/national-hpcs/#ncsa-blue-waters","text":"","title":"NCSA: Blue Waters"},{"location":"resources/national-hpcs/#prac","text":"Due in November | Application & Info","title":"PRAC"},{"location":"resources/national-hpcs/#blue-waters-innovation-allocations","text":"Rolling submission | Application & Info","title":"Blue Water\u2019s Innovation Allocations"},{"location":"resources/national-hpcs/#open-science-grid-osg","text":"Rolling Submission | Application & Info The OSG facilitates access to distributed high throughput computing for research in the US. The resources accessible through the OSG are contributed by the community, organized by the OSG, and governed by the OSG consortium.","title":"Open Science Grid (OSG)"},{"location":"resources/online-tutorials/","text":"Online Tutorials Linux/Unix and Command Line Introduction to Linux YCRC Workshop: Practical Introduction to Linux , ( Video ) *Recommended Most Commonly Used Commands - RedHat.com Command Line for Beginners - Ubuntu.com Note: You can learn more about most commands you come across by typing \"man [command]\" into the terminal. awk (text extraction/parsing) awk is a tool for parsing text and extracting certain section. It is particularly useful for extracting, and even reordering, columns out of tables in text files. Introduction to awk and examples of common usage In-depth guide to awk and more advanced usage grep Grep is tool for searching command line output or files for a certain string (phrase) or regular expression. Introduction to grep and examples of common usage In-depth guide to grep and more advanced usage sed sed (Stream EDitor) is a tool for making substitutions in a text file. For example, it can be useful for cleaning (e.g. replace NAN with 0) or reformatting data files. The syntax sed uses for substitutions is common in Linux (for example, the same syntax is used in the VIM text editor). Introduction to sed and examples of common usage In-depth guide to sed and more advanced usage SSH (connecting to the clusters or other remote linux servers) Connecting to the Yale clusters Transfer files to/from the cluster Advanced SSH configuration In-depth guide to ssh Bashrc and Bash Profiles What is the .bashrc and .bash_profile ? [Set aliases for commonly used commands] [Environment variables] tar or tar.gz archive .tar or t.ar.gz are common archive (compressed file) formats. Software and data will frequently be distributed in one of these archive formats. The most common command for opening and extracting the contents of a tar archive is tar xvf archive.tar and, for a tar.gz archive, tar xvzf archive.tar.gz . See the following link(s) for more details on creating tar files and more advanced extraction options. Creating and extracting from a tar file Install Windows and Linux on the same computer Windows for Linux It is possible to run Linux terminals and applications from within a Windows installation using the \"Windows Subsystem for Linux\". Windows Subsystem for Linux Dual Boot \"Dual Boot\" means you have two separate installations for Windows and Linux, respectively, that switch between by restarting your computer. Dual Boot Linux Mint and Windows Dual Boot Ubuntu and Windows Python Intro to Python Fantastic resource for anyone interested in Python LinkedIn Learning: Learning Python (Yale only) Parallel Programming with Python Quick Tutorial: Python Multiprocessing Parallel Programming with Python YCRC Workshop: Parallel Python mpi4py YCRC Workshop: mpi4py mpi4py example scripts Documentation for mpi4py R Intro to R Brief intro to R Thorough intro to R Another thorough intro to R foreach Using the foreach package - Steve Weston foreach + dompi Introduction to doMPI Matlab Mathworks Online Classses Singularity / Apptainer Documentation Singularity has officially been renamed Apptainer, but we expect no changes to its functionality. Apptainer Docs Page Singularity Google Groups Tutorials YCRC Workshop: Containers NIH tutorial on Singularity NVIDIA tutorial for using GPUs with Singularity","title":"Online Tutorials"},{"location":"resources/online-tutorials/#online-tutorials","text":"","title":"Online Tutorials"},{"location":"resources/online-tutorials/#linuxunix-and-command-line","text":"","title":"Linux/Unix and Command Line"},{"location":"resources/online-tutorials/#introduction-to-linux","text":"YCRC Workshop: Practical Introduction to Linux , ( Video ) *Recommended Most Commonly Used Commands - RedHat.com Command Line for Beginners - Ubuntu.com Note: You can learn more about most commands you come across by typing \"man [command]\" into the terminal.","title":"Introduction to Linux"},{"location":"resources/online-tutorials/#awk-text-extractionparsing","text":"awk is a tool for parsing text and extracting certain section. It is particularly useful for extracting, and even reordering, columns out of tables in text files. Introduction to awk and examples of common usage In-depth guide to awk and more advanced usage","title":"awk (text extraction/parsing)"},{"location":"resources/online-tutorials/#grep","text":"Grep is tool for searching command line output or files for a certain string (phrase) or regular expression. Introduction to grep and examples of common usage In-depth guide to grep and more advanced usage","title":"grep"},{"location":"resources/online-tutorials/#sed","text":"sed (Stream EDitor) is a tool for making substitutions in a text file. For example, it can be useful for cleaning (e.g. replace NAN with 0) or reformatting data files. The syntax sed uses for substitutions is common in Linux (for example, the same syntax is used in the VIM text editor). Introduction to sed and examples of common usage In-depth guide to sed and more advanced usage","title":"sed"},{"location":"resources/online-tutorials/#ssh-connecting-to-the-clusters-or-other-remote-linux-servers","text":"Connecting to the Yale clusters Transfer files to/from the cluster Advanced SSH configuration In-depth guide to ssh","title":"SSH (connecting to the clusters or other remote linux servers)"},{"location":"resources/online-tutorials/#bashrc-and-bash-profiles","text":"What is the .bashrc and .bash_profile ? [Set aliases for commonly used commands] [Environment variables]","title":"Bashrc and Bash Profiles"},{"location":"resources/online-tutorials/#tar-or-targz-archive","text":".tar or t.ar.gz are common archive (compressed file) formats. Software and data will frequently be distributed in one of these archive formats. The most common command for opening and extracting the contents of a tar archive is tar xvf archive.tar and, for a tar.gz archive, tar xvzf archive.tar.gz . See the following link(s) for more details on creating tar files and more advanced extraction options. Creating and extracting from a tar file","title":"tar or tar.gz archive"},{"location":"resources/online-tutorials/#install-windows-and-linux-on-the-same-computer","text":"","title":"Install Windows and Linux on the same computer"},{"location":"resources/online-tutorials/#windows-for-linux","text":"It is possible to run Linux terminals and applications from within a Windows installation using the \"Windows Subsystem for Linux\". Windows Subsystem for Linux","title":"Windows for Linux"},{"location":"resources/online-tutorials/#dual-boot","text":"\"Dual Boot\" means you have two separate installations for Windows and Linux, respectively, that switch between by restarting your computer. Dual Boot Linux Mint and Windows Dual Boot Ubuntu and Windows","title":"Dual Boot"},{"location":"resources/online-tutorials/#python","text":"","title":"Python"},{"location":"resources/online-tutorials/#intro-to-python","text":"Fantastic resource for anyone interested in Python LinkedIn Learning: Learning Python (Yale only)","title":"Intro to Python"},{"location":"resources/online-tutorials/#parallel-programming-with-python","text":"Quick Tutorial: Python Multiprocessing Parallel Programming with Python YCRC Workshop: Parallel Python","title":"Parallel Programming with Python"},{"location":"resources/online-tutorials/#mpi4py","text":"YCRC Workshop: mpi4py mpi4py example scripts Documentation for mpi4py","title":"mpi4py"},{"location":"resources/online-tutorials/#r","text":"","title":"R"},{"location":"resources/online-tutorials/#intro-to-r","text":"Brief intro to R Thorough intro to R Another thorough intro to R","title":"Intro to R"},{"location":"resources/online-tutorials/#foreach","text":"Using the foreach package - Steve Weston","title":"foreach"},{"location":"resources/online-tutorials/#foreach-dompi","text":"Introduction to doMPI","title":"foreach + dompi"},{"location":"resources/online-tutorials/#matlab","text":"Mathworks Online Classses","title":"Matlab"},{"location":"resources/online-tutorials/#singularity-apptainer","text":"","title":"Singularity / Apptainer"},{"location":"resources/online-tutorials/#documentation","text":"Singularity has officially been renamed Apptainer, but we expect no changes to its functionality. Apptainer Docs Page Singularity Google Groups","title":"Documentation"},{"location":"resources/online-tutorials/#tutorials","text":"YCRC Workshop: Containers NIH tutorial on Singularity NVIDIA tutorial for using GPUs with Singularity","title":"Tutorials"},{"location":"resources/sw_carpentry/","text":"Software Carpentry To help researchers learn the skills they need, they can utilize Software Carpentry 's in-house training as well as their community-led lesson development to help them get started. These in-house lessons are offered in both English and Spanish and go over Unix and Git basics as well as working with Python and R. To learn more about the community-based lessons available to users, see the Carpentries Lab page for more information.","title":"Software Carpentry"},{"location":"resources/sw_carpentry/#software-carpentry","text":"To help researchers learn the skills they need, they can utilize Software Carpentry 's in-house training as well as their community-led lesson development to help them get started. These in-house lessons are offered in both English and Spanish and go over Unix and Git basics as well as working with Python and R. To learn more about the community-based lessons available to users, see the Carpentries Lab page for more information.","title":"Software Carpentry"},{"location":"resources/yale_library/","text":"Yale Library The Yale Library has many resources available to cluster users. For more information about the Yale Library, see the Ask Yale Library page here . O'Reilly Safari eBooks The Yale Library offers access to the O'Reilly Safari eBooks collection through your Yale credentials. This can be accessed by this Safari eBooks access page making sure to sign in with your Yale email. Once logged on, users can access a variety of digital books and courses.","title":"Yale Library"},{"location":"resources/yale_library/#yale-library","text":"The Yale Library has many resources available to cluster users. For more information about the Yale Library, see the Ask Yale Library page here .","title":"Yale Library"},{"location":"resources/yale_library/#oreilly-safari-ebooks","text":"The Yale Library offers access to the O'Reilly Safari eBooks collection through your Yale credentials. This can be accessed by this Safari eBooks access page making sure to sign in with your Yale email. Once logged on, users can access a variety of digital books and courses.","title":"O'Reilly Safari eBooks"}]} \ No newline at end of file +{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Introduction The Yale Center for Research Computing provides support for research computing at Yale University. Our most active area for support is High Performance Computing, however we also support other computationally intensive research. In addition, we work with faculty and research groups across disciplines to design and maintain cost-effective computing capabilities. Introducing the McCleary HPC Cluster The YCRC is pleased to announce the new McCleary HPC cluster, which now serves researchers from the Yale School of Medicine, Yale Center for Genome Analysis and life scientists in the Faculty of Arts and Sciences! For more information, see our McCleary documentation . Get Help To best serve the research community, we provide one-on-one consulting and use a support tracking system. Troubleshooting Login Issues If you are experiencing issues logging into one of the clusters, please first check the current System Status for known issues and check the Troubleshoot Login guide first before seeking additional assistance. Web and Email Support To submit requests, issues, or questions please send us an email at hpc@yale.edu or sign on to our online support system at help.ycrc.yale.edu . Your login credentials there are your email and a password of your choosing, not your CAS password. Once received, our system will send an automated response with a link to a ticket. From there we'll track your ticket and make sure it's handled properly by the right person. Replies via email or the online support system go to the same place and are interchangeable. Constructive feedback is much appreciated. Office Hours via Zoom The YCRC hosts weekly office hours via Zoom on Wednesdays at 11am-12pm EST . Every Wednesday, Research support team members will be available to answer questions about the HPC clusters, data storage, cluster usage, etc. No appointments are necessary. Link: https://yale.zoom.us/my/ycrcsupport Phone: 203-432-9666 (2-ZOOM if on-campus)or 646 568 7788; Meeting ID: 224 666 8665 YouTube Channel The YCRC YouTube channel features recorded tutorials and workshops that cover a wide range of computing topics. New videos are added regularly and suggestions for topics can be submitted by emailing research.computing@yale.edu . One-on-One Support Research support team members are available by appointment for one-on-one support. See the table below for information about each person's area of particular focus. Please send requests for appointments with a team member to research.computing@yale.edu . If you have a general question or are unsure about who to meet with, include as much detail as possible about your request and we'll find the right person for you. Specialist Cluster(s) Areas of Focus Kathleen McKiernan All Getting Started Rob Bjornson, Ph.D. McCleary Life Sciences, Bioinformatics, Python, R Tom Langford, Ph.D. Grace / Milgram Physics, Python, MPI Aya Nawano, Ph.D. Grace Molecular Dynamics, Matlab, C/C++ Kaylea Nelson, Ph.D. Grace / Milgram Astronomy, EPS dept, MPI, Python Mike Rothberg, Ph.D. McCleary Computational Chemistry, Python, Matlab Michael Strickler, Ph.D. McCleary Life Sciences, Structural Biology Ping Luo Milgram Wu Tsai Institute, Psychology dept, Open OnDemand Andy Sherman, Ph.D. Grace MPI, GPUs Misha Guy, Ph.D. SRSC Software and Mathematica (email at mikhael.guy@yale.edu for appointment) Q&A Platform The YCRC hosts a Q&A platform at ask.cyberinfrastructure.org . Post questions about the clusters and receive answers from YCRC staff or even your peers! The sub-site for YCRC related questions is available at ask.cyberinfrastructure.org/g/Yale . Acknowledge the YCRC If publishing work performed on a YCRC cluster or with assistance from YCRC staff, we greatly appreciate acknowledgement of our staff and computing time in your publication. A list of YCRC staff can be found on our Staff page , and the clusters are summarized on our HPC Resources page . Example acknowledgement below: We thank the Yale Center for Research Computing, specifically [YCRC staff member name(s)], for guidance and assistance in computation run on the [cluster name here] cluster. Additionally, if you would be willing to send the publication information to research.computing@yale.edu , that would assist our efforts to capture work performed on YCRC resources and we can promote your work on our research.computing.yale.edu website.","title":"Introduction"},{"location":"#introduction","text":"The Yale Center for Research Computing provides support for research computing at Yale University. Our most active area for support is High Performance Computing, however we also support other computationally intensive research. In addition, we work with faculty and research groups across disciplines to design and maintain cost-effective computing capabilities. Introducing the McCleary HPC Cluster The YCRC is pleased to announce the new McCleary HPC cluster, which now serves researchers from the Yale School of Medicine, Yale Center for Genome Analysis and life scientists in the Faculty of Arts and Sciences! For more information, see our McCleary documentation .","title":"Introduction"},{"location":"#get-help","text":"To best serve the research community, we provide one-on-one consulting and use a support tracking system. Troubleshooting Login Issues If you are experiencing issues logging into one of the clusters, please first check the current System Status for known issues and check the Troubleshoot Login guide first before seeking additional assistance.","title":"Get Help"},{"location":"#web-and-email-support","text":"To submit requests, issues, or questions please send us an email at hpc@yale.edu or sign on to our online support system at help.ycrc.yale.edu . Your login credentials there are your email and a password of your choosing, not your CAS password. Once received, our system will send an automated response with a link to a ticket. From there we'll track your ticket and make sure it's handled properly by the right person. Replies via email or the online support system go to the same place and are interchangeable. Constructive feedback is much appreciated.","title":"Web and Email Support"},{"location":"#office-hours-via-zoom","text":"The YCRC hosts weekly office hours via Zoom on Wednesdays at 11am-12pm EST . Every Wednesday, Research support team members will be available to answer questions about the HPC clusters, data storage, cluster usage, etc. No appointments are necessary. Link: https://yale.zoom.us/my/ycrcsupport Phone: 203-432-9666 (2-ZOOM if on-campus)or 646 568 7788; Meeting ID: 224 666 8665","title":"Office Hours via Zoom"},{"location":"#youtube-channel","text":"The YCRC YouTube channel features recorded tutorials and workshops that cover a wide range of computing topics. New videos are added regularly and suggestions for topics can be submitted by emailing research.computing@yale.edu .","title":"YouTube Channel"},{"location":"#one-on-one-support","text":"Research support team members are available by appointment for one-on-one support. See the table below for information about each person's area of particular focus. Please send requests for appointments with a team member to research.computing@yale.edu . If you have a general question or are unsure about who to meet with, include as much detail as possible about your request and we'll find the right person for you. Specialist Cluster(s) Areas of Focus Kathleen McKiernan All Getting Started Rob Bjornson, Ph.D. McCleary Life Sciences, Bioinformatics, Python, R Tom Langford, Ph.D. Grace / Milgram Physics, Python, MPI Aya Nawano, Ph.D. Grace Molecular Dynamics, Matlab, C/C++ Kaylea Nelson, Ph.D. Grace / Milgram Astronomy, EPS dept, MPI, Python Mike Rothberg, Ph.D. McCleary Computational Chemistry, Python, Matlab Michael Strickler, Ph.D. McCleary Life Sciences, Structural Biology Ping Luo Milgram Wu Tsai Institute, Psychology dept, Open OnDemand Andy Sherman, Ph.D. Grace MPI, GPUs Misha Guy, Ph.D. SRSC Software and Mathematica (email at mikhael.guy@yale.edu for appointment)","title":"One-on-One Support"},{"location":"#qa-platform","text":"The YCRC hosts a Q&A platform at ask.cyberinfrastructure.org . Post questions about the clusters and receive answers from YCRC staff or even your peers! The sub-site for YCRC related questions is available at ask.cyberinfrastructure.org/g/Yale .","title":"Q&A Platform"},{"location":"#acknowledge-the-ycrc","text":"If publishing work performed on a YCRC cluster or with assistance from YCRC staff, we greatly appreciate acknowledgement of our staff and computing time in your publication. A list of YCRC staff can be found on our Staff page , and the clusters are summarized on our HPC Resources page . Example acknowledgement below: We thank the Yale Center for Research Computing, specifically [YCRC staff member name(s)], for guidance and assistance in computation run on the [cluster name here] cluster. Additionally, if you would be willing to send the publication information to research.computing@yale.edu , that would assist our efforts to capture work performed on YCRC resources and we can promote your work on our research.computing.yale.edu website.","title":"Acknowledge the YCRC"},{"location":"glossary/","text":"Glossary To help clarify the way we refer to certain terms in our user documentation, here is a brief list of some of the words that regularly come up in our documents. Please reach out to us at hpc@yale.edu if there are any other words that need to be added. Account - used to authenticate and grant permission to access resources Account (Slurm) - an accounting mechanism to keep track of a group's computing usage Activate - making something operational Array - a data structure across a series of memory locations consisting of elements organized in an index Array (job) - a series of jobs that all request the same resources and run the same batch script Array Task ID - a unique sequential number with an appended number that refers to an individual task within the set of submitted jobs Channel - Community-led collections of packages created by a group or lab installed with conda to allow for a homogenous environment across systems CLI - Command Line Interface processes commands to a computer program in the form of lines of text in a window Cluster - a set of computers, called nodes) networked together so nodes can perform the tasks facilitated by a scheduling software Command - a specific order from a computer to execute a service with either an application or the operating system Compute Node - the nodes that work runs on to perform computational work Container - A stack of software, libraries and operating system that is independent of the host computer and can be accessed on other computers Container Image - Self-contained read-only files used to run applications CPU - Central Processing Units are the components of a system that perform basic operations and exchange data with the system\u2019s memory (also known as a processor) Data - items of information collected together for reference or analysis Database - a collection of structured data held within a computer Deactivate - making something de-operational Environment - a collection of hardware, software, data storage and networks that work together in facilitating the processing and exchange of information Extension - Suffix at the end of a filename to indicate the file type Fileset - a section of a storage device that is given a designated purpose Filesystem - a process that manages how and where data is stored Flag - (see Options) GPU - Graphics Processing Units are specialized circuits designed to rapidly manipulate memory and create images in a frame buffer for a displayed output GridFTP - an extension of the Fire Transfer Protocol for grid computing that allows users to transfer and save data on a different account such as Google Drive or other off network memory Group - a collection of users who can all be given the same permissions on a system GUI - Graphical User Interface allows users to interact with devices and applications through a visual window that can commonly display icons and predetermined fields Hardware - the physical parts of a computer Host - (ie. Host Computer) A device connected to a computer network that offers resources, services and applications to users on the network Image - (See Container Image) Index - a method of sorting data by creating keywords or a listing of data Interface - a boundary across which two or more computer system components can exchange information Job - a unit of work given to an operating system by a scheduler Job Array - a way to submit multiple similar jobs by associating each subjob with an index value based on an array task id Key - a variable value applied using an algorithm to a block of unencrypted text to produce encrypted text Load - transfer a program or data into memory or into the CPU Login Node - a node that users log in on to access the cluster Memory - (see RAM) Metadata - A set of data that describes and gives basic information about other data Module - a number of distinct but interrelated units that build up or into a program MPI - Message Passing Interface is a standardized and portable message-passing standard designed to function on parallel computing architectures Multiprocessing - the ability to operate more than one task simultaneously on the same program across two or more processors in a computer Node - a server in the cluster Option - a single letter or full word that modifies the behavior of a command in a predetermined way (also known as a flag or switch) Package - a collection of hardware and software needed to create a working system Parallel - (ex. Computing/Programming) Architecture in which several processes are carried out simultaneously across smaller, independent parts Partition - a section of a storage device that is given a designated purpose Partition (Slurm) - a collection of compute nodes available via the scheduler Path - A string of characters used to identify locations throughout a directory structure Pane - (Associate with window) A subdivision within a window where an independent terminal can run simultaneously alongside another terminal Processor - (see CPU) Queue - a sequence of objects arranged according to priority waiting to be processed RAM - Random Access Memory, also known as \"Memory\" can be read and changed in any order and is typically used to to store working data Reproducibility - the ability to execute the same results across multiple systems by different individuals using the same data Scheduler - the software used to assign resources to a job for tasks Scheduling - the act of assigning resources to a task through a software product Session - a temporary information exchange between two or more devices SSH - secure shell is a cryptographic network protocol for operating network services securely over an unsecured network Software - a collection of data and instructions that tell a computer how to operate Switch - (see Options) System - a set of integrated hardware and software that input, output, process, and store data and information Task ID - a unique sequential number used to refer to a task Terminal - Referring to a terminal program, a text-based interface for typing commands Toolchain - a set of tools performing individual actions used in delivering an operation Unload - remove a program or data from memory or out of the CPU User - a person interacting and utilizing a computing service Variable - assigned and referenced data values that can be called within a program and changed depending on how the program runs Window - (Associate with pane) the whole screen being displayed, containing subdivisions, or panes, that can run independent terminals alongside each other","title":"Glossary"},{"location":"glossary/#glossary","text":"To help clarify the way we refer to certain terms in our user documentation, here is a brief list of some of the words that regularly come up in our documents. Please reach out to us at hpc@yale.edu if there are any other words that need to be added. Account - used to authenticate and grant permission to access resources Account (Slurm) - an accounting mechanism to keep track of a group's computing usage Activate - making something operational Array - a data structure across a series of memory locations consisting of elements organized in an index Array (job) - a series of jobs that all request the same resources and run the same batch script Array Task ID - a unique sequential number with an appended number that refers to an individual task within the set of submitted jobs Channel - Community-led collections of packages created by a group or lab installed with conda to allow for a homogenous environment across systems CLI - Command Line Interface processes commands to a computer program in the form of lines of text in a window Cluster - a set of computers, called nodes) networked together so nodes can perform the tasks facilitated by a scheduling software Command - a specific order from a computer to execute a service with either an application or the operating system Compute Node - the nodes that work runs on to perform computational work Container - A stack of software, libraries and operating system that is independent of the host computer and can be accessed on other computers Container Image - Self-contained read-only files used to run applications CPU - Central Processing Units are the components of a system that perform basic operations and exchange data with the system\u2019s memory (also known as a processor) Data - items of information collected together for reference or analysis Database - a collection of structured data held within a computer Deactivate - making something de-operational Environment - a collection of hardware, software, data storage and networks that work together in facilitating the processing and exchange of information Extension - Suffix at the end of a filename to indicate the file type Fileset - a section of a storage device that is given a designated purpose Filesystem - a process that manages how and where data is stored Flag - (see Options) GPU - Graphics Processing Units are specialized circuits designed to rapidly manipulate memory and create images in a frame buffer for a displayed output GridFTP - an extension of the Fire Transfer Protocol for grid computing that allows users to transfer and save data on a different account such as Google Drive or other off network memory Group - a collection of users who can all be given the same permissions on a system GUI - Graphical User Interface allows users to interact with devices and applications through a visual window that can commonly display icons and predetermined fields Hardware - the physical parts of a computer Host - (ie. Host Computer) A device connected to a computer network that offers resources, services and applications to users on the network Image - (See Container Image) Index - a method of sorting data by creating keywords or a listing of data Interface - a boundary across which two or more computer system components can exchange information Job - a unit of work given to an operating system by a scheduler Job Array - a way to submit multiple similar jobs by associating each subjob with an index value based on an array task id Key - a variable value applied using an algorithm to a block of unencrypted text to produce encrypted text Load - transfer a program or data into memory or into the CPU Login Node - a node that users log in on to access the cluster Memory - (see RAM) Metadata - A set of data that describes and gives basic information about other data Module - a number of distinct but interrelated units that build up or into a program MPI - Message Passing Interface is a standardized and portable message-passing standard designed to function on parallel computing architectures Multiprocessing - the ability to operate more than one task simultaneously on the same program across two or more processors in a computer Node - a server in the cluster Option - a single letter or full word that modifies the behavior of a command in a predetermined way (also known as a flag or switch) Package - a collection of hardware and software needed to create a working system Parallel - (ex. Computing/Programming) Architecture in which several processes are carried out simultaneously across smaller, independent parts Partition - a section of a storage device that is given a designated purpose Partition (Slurm) - a collection of compute nodes available via the scheduler Path - A string of characters used to identify locations throughout a directory structure Pane - (Associate with window) A subdivision within a window where an independent terminal can run simultaneously alongside another terminal Processor - (see CPU) Queue - a sequence of objects arranged according to priority waiting to be processed RAM - Random Access Memory, also known as \"Memory\" can be read and changed in any order and is typically used to to store working data Reproducibility - the ability to execute the same results across multiple systems by different individuals using the same data Scheduler - the software used to assign resources to a job for tasks Scheduling - the act of assigning resources to a task through a software product Session - a temporary information exchange between two or more devices SSH - secure shell is a cryptographic network protocol for operating network services securely over an unsecured network Software - a collection of data and instructions that tell a computer how to operate Switch - (see Options) System - a set of integrated hardware and software that input, output, process, and store data and information Task ID - a unique sequential number used to refer to a task Terminal - Referring to a terminal program, a text-based interface for typing commands Toolchain - a set of tools performing individual actions used in delivering an operation Unload - remove a program or data from memory or out of the CPU User - a person interacting and utilizing a computing service Variable - assigned and referenced data values that can be called within a program and changed depending on how the program runs Window - (Associate with pane) the whole screen being displayed, containing subdivisions, or panes, that can run independent terminals alongside each other","title":"Glossary"},{"location":"news/","text":"News {{ blog_content }}","title":"News"},{"location":"news/#news","text":"{{ blog_content }}","title":"News"},{"location":"user-group/","text":"YCRC User Group The YCRC User Group is a community of researchers at Yale who utilize computing resources and technology to enable their research. You can join the User Group mailing list and forum where you can post questions or tips to other YCRC users at https://groups.io/g/ycrcusergroup .","title":"YCRC User Group"},{"location":"user-group/#ycrc-user-group","text":"The YCRC User Group is a community of researchers at Yale who utilize computing resources and technology to enable their research. You can join the User Group mailing list and forum where you can post questions or tips to other YCRC users at https://groups.io/g/ycrcusergroup .","title":"YCRC User Group"},{"location":"clusters/","text":"HPC Resources The YCRC maintains and supports a number of high performance computing systems for the Yale research community. Our high performance computing systems are named after notable members of the Yale community . Each YCRC cluster undergoes regular scheduled maintenance twice a year, see our maintenance schedule for more details. For proposals, we provide a description of our facilities, equipment, and other resources for HPC and research computing . Compute We maintain and support three Red Hat Linux compute clusters, listed below. Please click on cluster names for more information. Info The Farnam and Ruddle clusters were both retired in 2023 and their users are now supported on the McCleary cluster. Cluster Name Approx. Core Count Approx. Node Count Login Address Purpose Grace 26,000 740 grace.ycrc.yale.edu general and highly parallel, tightly coupled (InfiniBand) McCleary 13,000 340 mccleary.ycrc.yale.edu medical and life science, YCGA Milgram 2,400 80 milgram.ycrc.yale.edu HIPAA and other sensitive data Storage We maintain several high performance storage systems. Listed below are these shared filesystems and the clusters where they are available. We distinguish where clusters store their home directories with an asterisk. The directory /home will always point to your home directory on the cluster you logged into. For more information about storage quotas and purchasing storage see the Cluster Storage page. Name Path Size Mounting Clusters File System Software Purpose Palmer /vast/palmer 700 TiB Grace*, McCleary* Vast home, scratch storage Gibbs /gpfs/gibbs 14.0 PiB Grace, McCleary IBM Spectrum Scale (GPFS) project, purchased project-style storage Slayman /gpfs/slayman 1.0 PiB Grace, McCleary IBM Spectrum Scale (GPFS) purchased project-style storage Milgram /gpfs/milgram 3.0 PiB Milgram* IBM Spectrum Scale (GPFS) Milgram primary storage YCGA /gpfs/ycga 3.0 PiB McCleary IBM Spectrum Scale (GPFS) YCGA storage","title":"Overview"},{"location":"clusters/#hpc-resources","text":"The YCRC maintains and supports a number of high performance computing systems for the Yale research community. Our high performance computing systems are named after notable members of the Yale community . Each YCRC cluster undergoes regular scheduled maintenance twice a year, see our maintenance schedule for more details. For proposals, we provide a description of our facilities, equipment, and other resources for HPC and research computing .","title":"HPC Resources"},{"location":"clusters/#compute","text":"We maintain and support three Red Hat Linux compute clusters, listed below. Please click on cluster names for more information. Info The Farnam and Ruddle clusters were both retired in 2023 and their users are now supported on the McCleary cluster. Cluster Name Approx. Core Count Approx. Node Count Login Address Purpose Grace 26,000 740 grace.ycrc.yale.edu general and highly parallel, tightly coupled (InfiniBand) McCleary 13,000 340 mccleary.ycrc.yale.edu medical and life science, YCGA Milgram 2,400 80 milgram.ycrc.yale.edu HIPAA and other sensitive data","title":"Compute"},{"location":"clusters/#storage","text":"We maintain several high performance storage systems. Listed below are these shared filesystems and the clusters where they are available. We distinguish where clusters store their home directories with an asterisk. The directory /home will always point to your home directory on the cluster you logged into. For more information about storage quotas and purchasing storage see the Cluster Storage page. Name Path Size Mounting Clusters File System Software Purpose Palmer /vast/palmer 700 TiB Grace*, McCleary* Vast home, scratch storage Gibbs /gpfs/gibbs 14.0 PiB Grace, McCleary IBM Spectrum Scale (GPFS) project, purchased project-style storage Slayman /gpfs/slayman 1.0 PiB Grace, McCleary IBM Spectrum Scale (GPFS) purchased project-style storage Milgram /gpfs/milgram 3.0 PiB Milgram* IBM Spectrum Scale (GPFS) Milgram primary storage YCGA /gpfs/ycga 3.0 PiB McCleary IBM Spectrum Scale (GPFS) YCGA storage","title":"Storage"},{"location":"clusters/farnam/","text":"Farnam Farnam was a shared-use resource for the Yale School of Medicine (YSM). The Farnam Cluster was named for Louise Whitman Farnam , the first woman to graduate from the Yale School of Medicine, class of 1916. Farnam Retirement After more than six years in service, the Farnam HPC cluster was retired on June 1, 2023. Farnam was replaced with the new HPC cluster, McCleary . For more information and updates see the McCleary announcement page .","title":"Farnam"},{"location":"clusters/farnam/#farnam","text":"Farnam was a shared-use resource for the Yale School of Medicine (YSM). The Farnam Cluster was named for Louise Whitman Farnam , the first woman to graduate from the Yale School of Medicine, class of 1916. Farnam Retirement After more than six years in service, the Farnam HPC cluster was retired on June 1, 2023. Farnam was replaced with the new HPC cluster, McCleary . For more information and updates see the McCleary announcement page .","title":"Farnam"},{"location":"clusters/grace/","text":"Grace Grace is a shared-use resource for the Faculty of Arts and Sciences (FAS). It consists of a variety of compute nodes networked over low-latency InfiniBand and mounts several shared filesystems. The Grace cluster is is named for the computer scientist and United States Navy Rear Admiral Grace Murray Hopper , who received her Ph.D. in Mathematics from Yale in 1934. Operating System Upgrade During the August 2023 maintenance, the operating system on Grace was upgraded from Red Hat 7 to Red Hat 8. For more information, see our Grace Operating System Upgrade page. Access the Cluster Once you have an account , the cluster can be accessed via ssh or through the Open OnDemand web portal . Warning You are limited to 4 interactive app instances (of any type) at one time. Additional instances will be rejected until you delete older open instances. For OnDemand jobs, closing the window does not terminate the interactive app job. To terminate the job, click the \"Delete\" button in your \"My Interactive Apps\" page in the web portal. System Status and Monitoring For system status messages and the schedule for upcoming maintenance, please see the system status page . For a current node-level view of job activity, see the cluster monitor page (VPN only) . Partitions and Hardware Grace is made up of several kinds of compute nodes. We group them into (sometimes overlapping) Slurm partitions meant to serve different purposes. By combining the --partition and --constraint Slurm options you can more finely control what nodes your jobs can run on. Job Submission Rate Limits Job submissions are limited to 200 jobs per hour . See the Rate Limits section in the Common Job Failures page for more info. Public Partitions See each tab below for more information about the available common use partitions. day Use the day partition for most batch jobs. This is the default if you don't specify one with --partition . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the day partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per group 2500 Maximum CPUs per user 1000 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 72 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, common, bigtmp 97 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp devel Use the devel partition to jobs with which you need ongoing interaction. For example, exploratory analyses or debugging compilation. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the devel partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 4 Maximum memory per user 32G Maximum submitted jobs per user 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6126 24 174 skylake, avx512, 6126, nogpu, standard, common 4 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest week Use the week partition for jobs that need a longer runtime than day allows. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the week partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Maximum CPUs per group 252 Maximum CPUs per user 108 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 25 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp transfer Use the transfer partition to stage data for your jobs to and from cluster storage . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the transfer partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum running jobs per user 2 Maximum CPUs per job 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 7642 8 237 epyc, 7642, nogpu, standard, common gpu Use the gpu partition for jobs that make use of GPUs. You must request GPUs explicitly with the --gpus option in order to use them. For example, --gpus=gtx1080ti:2 would request 2 GeForce GTX 1080Ti GPUs per node. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Info Interactive jobs ( salloc or Open OnDemand) are not allowed in the gpu partition. Please submit those jobs to gpu_devel . GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the gpu partition are subject to the following limits: Limit Value Maximum job time limit 2-00:00:00 Maximum GPUs per user 24 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 12 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, common 5 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, common, bigtmp, rtx2080ti 2 6240 36 361 a100 4 40 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, a100 4 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, common, v100 2 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 2 6136 24 90 v100 2 16 skylake, avx512, 6136, doubleprecision, common, bigtmp, v100 gpu_devel Use the gpu_devel partition to debug jobs that make use of GPUs, or to develop GPU-enabled code. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the gpu_devel partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 10 Maximum GPUs per user 4 Maximum submitted jobs per user 2 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, common, bigtmp, rtx2080ti 1 6240 36 361 a100 4 40 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, a100 2 6240 36 166 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, rtx3090 1 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, common, v100 4 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 bigmem Use the bigmem partition for jobs that have memory requirements other partitions can't handle. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the bigmem partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 40 Maximum memory per user 4000G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 3 6240 36 1505 cascadelake, avx512, 6240, nogpu, common, bigtmp 4 6346 32 3936 cascadelake, avx512, 6346, common, nogpu, bigtmp 2 6234 16 1505 cascadelake, avx512, nogpu, 6234, common, bigtmp mpi Use the mpi partition for tightly-coupled parallel programs that make efficient use of multiple nodes. See our MPI documentation if your workload fits this description. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --exclusive --mem=92160 Job Limits Jobs submitted to the mpi partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum nodes per group 64 Maximum nodes per user 64 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 128 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, common, bigtmp scavenge Use the scavenge partition to run preemptable jobs on more resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the scavenge partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 10000 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 12 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, common 2 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 72 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, common, bigtmp 72 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 87 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 135 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp 5 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, common, bigtmp, rtx2080ti 2 6240 36 361 a100 4 40 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, a100 20 8260 96 181 cascadelake, avx512, 8260, nogpu, pi 2 6240 36 166 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, rtx3090 4 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 2 6240 36 180 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, pi, rtx3090 4 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, common, v100 3 6240 36 1505 cascadelake, avx512, 6240, nogpu, common, bigtmp 1 6326 32 1001 a100 4 40 cascadelake, avx512, 6326, doubleprecision, bigtmp, pi, a100-80g 4 6346 32 3936 cascadelake, avx512, 6346, common, nogpu, bigtmp 3 6234 16 1505 cascadelake, avx512, nogpu, 6234, pi, bigtmp 1 6254 36 370 rtx2080ti 8 11 cascadelake, avx512, 6254, singleprecision, pi, bigtmp, rtx2080ti 2 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, pi, v100 8 6240 36 370 cascadelake, avx512, 6240, nogpu, pi, bigtmp 2 6234 16 1505 cascadelake, avx512, nogpu, 6234, common, bigtmp 6 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 16 6136 24 90 edr, skylake, avx512, 6136, nogpu, standard, pi, bigtmp 3 6142 32 181 skylake, avx512, 6142, nogpu, standard, pi, bigtmp 16 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, pi, bigtmp 128 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, common, bigtmp 4 6136 24 90 hdr, skylake, avx512, 6136, nogpu, pi, common, bigtmp 2 6136 24 90 v100 2 16 skylake, avx512, 6136, doubleprecision, common, bigtmp, v100 9 6136 24 181 p100 4 16 skylake, avx512, 6136, doubleprecision, pi, p100 2 5122 8 181 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 1 6136 24 749 skylake, avx512, 6136, nogpu, pi, bigtmp 74 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest 2 E7-4820_v4 40 1505 broadwell, E7-4820_v4, nogpu, pi, oldest 1 E5-2637_v4 8 119 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, bigtmp, gtx1080ti, oldest 1 E5-2660_v4 28 245 p100 1 16 broadwell, E5-2660_v4, doubleprecision, pi, p100, oldest scavenge_gpu Use the scavenge_gpu partition to run preemptable jobs on more GPU resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the scavenge_gpu partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum GPUs per user 30 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 12 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, common 2 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 5 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, common, bigtmp, rtx2080ti 2 6240 36 361 a100 4 40 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, a100 2 6240 36 166 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, rtx3090 4 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 2 6240 36 180 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, pi, rtx3090 4 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, common, v100 1 6326 32 1001 a100 4 40 cascadelake, avx512, 6326, doubleprecision, bigtmp, pi, a100-80g 1 6254 36 370 rtx2080ti 8 11 cascadelake, avx512, 6254, singleprecision, pi, bigtmp, rtx2080ti 2 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, pi, v100 6 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 2 6136 24 90 v100 2 16 skylake, avx512, 6136, doubleprecision, common, bigtmp, v100 9 6136 24 181 p100 4 16 skylake, avx512, 6136, doubleprecision, pi, p100 2 5122 8 181 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 1 E5-2637_v4 8 119 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, bigtmp, gtx1080ti, oldest 1 E5-2660_v4 28 245 p100 1 16 broadwell, E5-2660_v4, doubleprecision, pi, p100, oldest scavenge_mpi Use the scavenge_mpi partition to run preemptable jobs on more MPI resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --exclusive --mem=92160 Job Limits Jobs submitted to the scavenge_mpi partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum nodes per group 64 Maximum nodes per user 64 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 128 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, common, bigtmp Private Partitions With few exceptions, jobs submitted to private partitions are not considered when calculating your group's Fairshare . Your group can purchase additional hardware for private use, which we will make available as a pi_groupname partition. These nodes are purchased by you, but supported and administered by us. After vendor support expires, we retire compute nodes. Compute nodes can range from $10K to upwards of $50K depending on your requirements. If you are interested in purchasing nodes for your group, please contact us . PI Partitions (click to expand) pi_anticevic Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_anticevic partition are subject to the following limits: Limit Value Maximum job time limit 100-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 20 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_balou Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_balou partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 14 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 9 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 26 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_berry Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_berry partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 1 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp pi_chem_chase Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=3840 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_chem_chase partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 8 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 1 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti pi_cowles Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_cowles partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Maximum CPUs per user 120 Maximum nodes per user 5 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 9 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp pi_econ_io Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_econ_io partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 6 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_econ_lp Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_econ_lp partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 7 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 5 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 1 6234 16 1505 cascadelake, avx512, nogpu, 6234, pi, bigtmp pi_esi Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_esi partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Maximum CPUs per user 648 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 36 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_fedorov Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=3840 Job Limits Jobs submitted to the pi_fedorov partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 12 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, pi, bigtmp 4 6136 24 90 hdr, skylake, avx512, 6136, nogpu, pi, common, bigtmp pi_gelernter Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_gelernter partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 1 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_hammes_schiffer Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=3840 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_hammes_schiffer partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 8 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 6 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 1 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 16 6136 24 90 edr, skylake, avx512, 6136, nogpu, standard, pi, bigtmp 2 5122 8 181 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 1 6136 24 749 skylake, avx512, 6136, nogpu, pi, bigtmp 1 E5-2637_v4 8 119 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, bigtmp, gtx1080ti, oldest pi_hodgson Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_hodgson partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_holland Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_holland partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 8 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_howard Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_howard partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_jorgensen Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_jorgensen partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 3 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_kim_theodore Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_kim_theodore partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 1 6234 16 1505 cascadelake, avx512, nogpu, 6234, pi, bigtmp pi_korenaga Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_korenaga partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 3 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp pi_lederman Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_lederman partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6254 36 1505 rtx4000,rtx8000,v100 4,2,2 8,48,16 cascadelake, avx512, 6254, pi, bigtmp, rtx8000 pi_levine Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=1952 Job Limits Jobs submitted to the pi_levine partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 20 8260 96 181 cascadelake, avx512, 8260, nogpu, pi pi_lora Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=3840 Job Limits Jobs submitted to the pi_lora partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 5 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 4 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, pi, bigtmp pi_mak Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_mak partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_manohar Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_manohar partition are subject to the following limits: Limit Value Maximum job time limit 180-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 4 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 8 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest 2 E7-4820_v4 40 1505 broadwell, E7-4820_v4, nogpu, pi, oldest 1 E5-2660_v4 28 245 p100 1 16 broadwell, E5-2660_v4, doubleprecision, pi, p100, oldest pi_ohern Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_ohern partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 8 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 2 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 9 6136 24 181 p100 4 16 skylake, avx512, 6136, doubleprecision, pi, p100 3 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_owen_miller Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_owen_miller partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 1 6234 16 1505 cascadelake, avx512, nogpu, 6234, pi, bigtmp 5 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_padmanabhan Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_padmanabhan partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 3 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_panda Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_panda partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 3 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 1 6326 32 1001 a100 4 40 cascadelake, avx512, 6326, doubleprecision, bigtmp, pi, a100-80g 1 6254 36 370 rtx2080ti 8 11 cascadelake, avx512, 6254, singleprecision, pi, bigtmp, rtx2080ti 2 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, pi, v100 pi_poland Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_poland partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 8 6240 36 370 cascadelake, avx512, 6240, nogpu, pi, bigtmp 9 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_polimanti Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_polimanti partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_seto Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_seto partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 3 6142 32 181 skylake, avx512, 6142, nogpu, standard, pi, bigtmp pi_spielman Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_spielman partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp pi_sweeney Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_sweeney partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 6240 36 180 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, pi, rtx3090 pi_tsmith Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_tsmith partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp pi_vaccaro Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_vaccaro partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp pi_zhu Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_zhu partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 12 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp Storage Grace has access to a number of filesystems. /vast/palmer hosts Grace's home and scratch directories and /gpfs/gibbs hosts project directories and most additional purchased storage allocations. For more details on the different storage spaces, see our Cluster Storage documentation. You can check your current storage usage & limits by running the getquota command. Your ~/project and ~/palmer_scratch directories are shortcuts. Get a list of the absolute paths to your directories with the mydirectories command. If you want to share data in your Project or Scratch directory, see the permissions page. For information on data recovery, see the Backups and Snapshots documentation. Warning Files stored in palmer_scratch are purged if they are older than 60 days. You will receive an email alert one week before they are deleted. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage. Partition Root Directory Storage File Count Backups Snapshots Notes home /vast/palmer/home.grace 125GiB/user 500,000 Yes >=2 days project /gpfs/gibbs/project 1TiB/group, increase to 4TiB on request 5,000,000 No >=2 days scratch /vast/palmer/scratch 10TiB/group 15,000,000 No No","title":"Grace"},{"location":"clusters/grace/#grace","text":"Grace is a shared-use resource for the Faculty of Arts and Sciences (FAS). It consists of a variety of compute nodes networked over low-latency InfiniBand and mounts several shared filesystems. The Grace cluster is is named for the computer scientist and United States Navy Rear Admiral Grace Murray Hopper , who received her Ph.D. in Mathematics from Yale in 1934. Operating System Upgrade During the August 2023 maintenance, the operating system on Grace was upgraded from Red Hat 7 to Red Hat 8. For more information, see our Grace Operating System Upgrade page.","title":"Grace"},{"location":"clusters/grace/#access-the-cluster","text":"Once you have an account , the cluster can be accessed via ssh or through the Open OnDemand web portal . Warning You are limited to 4 interactive app instances (of any type) at one time. Additional instances will be rejected until you delete older open instances. For OnDemand jobs, closing the window does not terminate the interactive app job. To terminate the job, click the \"Delete\" button in your \"My Interactive Apps\" page in the web portal.","title":"Access the Cluster"},{"location":"clusters/grace/#system-status-and-monitoring","text":"For system status messages and the schedule for upcoming maintenance, please see the system status page . For a current node-level view of job activity, see the cluster monitor page (VPN only) .","title":"System Status and Monitoring"},{"location":"clusters/grace/#partitions-and-hardware","text":"Grace is made up of several kinds of compute nodes. We group them into (sometimes overlapping) Slurm partitions meant to serve different purposes. By combining the --partition and --constraint Slurm options you can more finely control what nodes your jobs can run on. Job Submission Rate Limits Job submissions are limited to 200 jobs per hour . See the Rate Limits section in the Common Job Failures page for more info.","title":"Partitions and Hardware"},{"location":"clusters/grace/#public-partitions","text":"See each tab below for more information about the available common use partitions. day Use the day partition for most batch jobs. This is the default if you don't specify one with --partition . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the day partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per group 2500 Maximum CPUs per user 1000 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 72 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, common, bigtmp 97 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp devel Use the devel partition to jobs with which you need ongoing interaction. For example, exploratory analyses or debugging compilation. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the devel partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 4 Maximum memory per user 32G Maximum submitted jobs per user 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6126 24 174 skylake, avx512, 6126, nogpu, standard, common 4 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest week Use the week partition for jobs that need a longer runtime than day allows. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the week partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Maximum CPUs per group 252 Maximum CPUs per user 108 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 25 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp transfer Use the transfer partition to stage data for your jobs to and from cluster storage . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the transfer partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum running jobs per user 2 Maximum CPUs per job 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 7642 8 237 epyc, 7642, nogpu, standard, common gpu Use the gpu partition for jobs that make use of GPUs. You must request GPUs explicitly with the --gpus option in order to use them. For example, --gpus=gtx1080ti:2 would request 2 GeForce GTX 1080Ti GPUs per node. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Info Interactive jobs ( salloc or Open OnDemand) are not allowed in the gpu partition. Please submit those jobs to gpu_devel . GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the gpu partition are subject to the following limits: Limit Value Maximum job time limit 2-00:00:00 Maximum GPUs per user 24 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 12 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, common 5 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, common, bigtmp, rtx2080ti 2 6240 36 361 a100 4 40 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, a100 4 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, common, v100 2 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 2 6136 24 90 v100 2 16 skylake, avx512, 6136, doubleprecision, common, bigtmp, v100 gpu_devel Use the gpu_devel partition to debug jobs that make use of GPUs, or to develop GPU-enabled code. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the gpu_devel partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 10 Maximum GPUs per user 4 Maximum submitted jobs per user 2 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, common, bigtmp, rtx2080ti 1 6240 36 361 a100 4 40 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, a100 2 6240 36 166 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, rtx3090 1 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, common, v100 4 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 bigmem Use the bigmem partition for jobs that have memory requirements other partitions can't handle. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the bigmem partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 40 Maximum memory per user 4000G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 3 6240 36 1505 cascadelake, avx512, 6240, nogpu, common, bigtmp 4 6346 32 3936 cascadelake, avx512, 6346, common, nogpu, bigtmp 2 6234 16 1505 cascadelake, avx512, nogpu, 6234, common, bigtmp mpi Use the mpi partition for tightly-coupled parallel programs that make efficient use of multiple nodes. See our MPI documentation if your workload fits this description. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --exclusive --mem=92160 Job Limits Jobs submitted to the mpi partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum nodes per group 64 Maximum nodes per user 64 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 128 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, common, bigtmp scavenge Use the scavenge partition to run preemptable jobs on more resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the scavenge partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 10000 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 12 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, common 2 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 72 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, common, bigtmp 72 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 87 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 135 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp 5 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, common, bigtmp, rtx2080ti 2 6240 36 361 a100 4 40 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, a100 20 8260 96 181 cascadelake, avx512, 8260, nogpu, pi 2 6240 36 166 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, rtx3090 4 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 2 6240 36 180 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, pi, rtx3090 4 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, common, v100 3 6240 36 1505 cascadelake, avx512, 6240, nogpu, common, bigtmp 1 6326 32 1001 a100 4 40 cascadelake, avx512, 6326, doubleprecision, bigtmp, pi, a100-80g 4 6346 32 3936 cascadelake, avx512, 6346, common, nogpu, bigtmp 3 6234 16 1505 cascadelake, avx512, nogpu, 6234, pi, bigtmp 1 6254 36 370 rtx2080ti 8 11 cascadelake, avx512, 6254, singleprecision, pi, bigtmp, rtx2080ti 2 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, pi, v100 8 6240 36 370 cascadelake, avx512, 6240, nogpu, pi, bigtmp 2 6234 16 1505 cascadelake, avx512, nogpu, 6234, common, bigtmp 6 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 16 6136 24 90 edr, skylake, avx512, 6136, nogpu, standard, pi, bigtmp 3 6142 32 181 skylake, avx512, 6142, nogpu, standard, pi, bigtmp 16 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, pi, bigtmp 128 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, common, bigtmp 4 6136 24 90 hdr, skylake, avx512, 6136, nogpu, pi, common, bigtmp 2 6136 24 90 v100 2 16 skylake, avx512, 6136, doubleprecision, common, bigtmp, v100 9 6136 24 181 p100 4 16 skylake, avx512, 6136, doubleprecision, pi, p100 2 5122 8 181 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 1 6136 24 749 skylake, avx512, 6136, nogpu, pi, bigtmp 74 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest 2 E7-4820_v4 40 1505 broadwell, E7-4820_v4, nogpu, pi, oldest 1 E5-2637_v4 8 119 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, bigtmp, gtx1080ti, oldest 1 E5-2660_v4 28 245 p100 1 16 broadwell, E5-2660_v4, doubleprecision, pi, p100, oldest scavenge_gpu Use the scavenge_gpu partition to run preemptable jobs on more GPU resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the scavenge_gpu partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum GPUs per user 30 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 12 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, common 2 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 5 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, common, bigtmp, rtx2080ti 2 6240 36 361 a100 4 40 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, a100 2 6240 36 166 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, common, rtx3090 4 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 2 6240 36 180 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, pi, rtx3090 4 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, common, v100 1 6326 32 1001 a100 4 40 cascadelake, avx512, 6326, doubleprecision, bigtmp, pi, a100-80g 1 6254 36 370 rtx2080ti 8 11 cascadelake, avx512, 6254, singleprecision, pi, bigtmp, rtx2080ti 2 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, pi, v100 6 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 2 6136 24 90 v100 2 16 skylake, avx512, 6136, doubleprecision, common, bigtmp, v100 9 6136 24 181 p100 4 16 skylake, avx512, 6136, doubleprecision, pi, p100 2 5122 8 181 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 1 E5-2637_v4 8 119 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, bigtmp, gtx1080ti, oldest 1 E5-2660_v4 28 245 p100 1 16 broadwell, E5-2660_v4, doubleprecision, pi, p100, oldest scavenge_mpi Use the scavenge_mpi partition to run preemptable jobs on more MPI resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --exclusive --mem=92160 Job Limits Jobs submitted to the scavenge_mpi partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum nodes per group 64 Maximum nodes per user 64 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 128 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, common, bigtmp","title":"Public Partitions"},{"location":"clusters/grace/#private-partitions","text":"With few exceptions, jobs submitted to private partitions are not considered when calculating your group's Fairshare . Your group can purchase additional hardware for private use, which we will make available as a pi_groupname partition. These nodes are purchased by you, but supported and administered by us. After vendor support expires, we retire compute nodes. Compute nodes can range from $10K to upwards of $50K depending on your requirements. If you are interested in purchasing nodes for your group, please contact us . PI Partitions (click to expand) pi_anticevic Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_anticevic partition are subject to the following limits: Limit Value Maximum job time limit 100-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 20 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_balou Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_balou partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 14 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 9 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 26 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_berry Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_berry partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 1 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp pi_chem_chase Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=3840 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_chem_chase partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 8 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 1 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti pi_cowles Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_cowles partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Maximum CPUs per user 120 Maximum nodes per user 5 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 9 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp pi_econ_io Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_econ_io partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 6 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_econ_lp Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_econ_lp partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 7 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 5 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 1 6234 16 1505 cascadelake, avx512, nogpu, 6234, pi, bigtmp pi_esi Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_esi partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Maximum CPUs per user 648 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 36 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_fedorov Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=3840 Job Limits Jobs submitted to the pi_fedorov partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 12 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, pi, bigtmp 4 6136 24 90 hdr, skylake, avx512, 6136, nogpu, pi, common, bigtmp pi_gelernter Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_gelernter partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 1 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_hammes_schiffer Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=3840 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_hammes_schiffer partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 8 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 6 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 1 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 16 6136 24 90 edr, skylake, avx512, 6136, nogpu, standard, pi, bigtmp 2 5122 8 181 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 1 6136 24 749 skylake, avx512, 6136, nogpu, pi, bigtmp 1 E5-2637_v4 8 119 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, bigtmp, gtx1080ti, oldest pi_hodgson Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_hodgson partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_holland Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_holland partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 8 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_howard Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_howard partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_jorgensen Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_jorgensen partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 3 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_kim_theodore Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_kim_theodore partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 1 6234 16 1505 cascadelake, avx512, nogpu, 6234, pi, bigtmp pi_korenaga Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_korenaga partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 3 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp pi_lederman Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_lederman partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6254 36 1505 rtx4000,rtx8000,v100 4,2,2 8,48,16 cascadelake, avx512, 6254, pi, bigtmp, rtx8000 pi_levine Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=1952 Job Limits Jobs submitted to the pi_levine partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 20 8260 96 181 cascadelake, avx512, 8260, nogpu, pi pi_lora Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=3840 Job Limits Jobs submitted to the pi_lora partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 5 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 4 6136 24 90 hdr, skylake, avx512, 6136, nogpu, standard, pi, bigtmp pi_mak Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_mak partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_manohar Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_manohar partition are subject to the following limits: Limit Value Maximum job time limit 180-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 4 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 8 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest 2 E7-4820_v4 40 1505 broadwell, E7-4820_v4, nogpu, pi, oldest 1 E5-2660_v4 28 245 p100 1 16 broadwell, E5-2660_v4, doubleprecision, pi, p100, oldest pi_ohern Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_ohern partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 8 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 2 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp 9 6136 24 181 p100 4 16 skylake, avx512, 6136, doubleprecision, pi, p100 3 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_owen_miller Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_owen_miller partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp 1 6234 16 1505 cascadelake, avx512, nogpu, 6234, pi, bigtmp 5 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_padmanabhan Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_padmanabhan partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 3 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_panda Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_panda partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 3 6240 36 181 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 1 6326 32 1001 a100 4 40 cascadelake, avx512, 6326, doubleprecision, bigtmp, pi, a100-80g 1 6254 36 370 rtx2080ti 8 11 cascadelake, avx512, 6254, singleprecision, pi, bigtmp, rtx2080ti 2 6240 36 370 v100 4 16 cascadelake, avx512, 6240, doubleprecision, pi, v100 pi_poland Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_poland partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 8 6240 36 370 cascadelake, avx512, 6240, nogpu, pi, bigtmp 9 E5-2660_v4 28 245 broadwell, E5-2660_v4, nogpu, standard, pi, oldest pi_polimanti Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_polimanti partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp pi_seto Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_seto partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 3 6142 32 181 skylake, avx512, 6142, nogpu, standard, pi, bigtmp pi_spielman Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_spielman partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp pi_sweeney Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_sweeney partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 6240 36 180 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, bigtmp, pi, rtx3090 pi_tsmith Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_tsmith partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp pi_vaccaro Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_vaccaro partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp pi_zhu Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_zhu partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 12 8268 48 356 cascadelake, avx512, 8268, nogpu, standard, pi, bigtmp","title":"Private Partitions"},{"location":"clusters/grace/#storage","text":"Grace has access to a number of filesystems. /vast/palmer hosts Grace's home and scratch directories and /gpfs/gibbs hosts project directories and most additional purchased storage allocations. For more details on the different storage spaces, see our Cluster Storage documentation. You can check your current storage usage & limits by running the getquota command. Your ~/project and ~/palmer_scratch directories are shortcuts. Get a list of the absolute paths to your directories with the mydirectories command. If you want to share data in your Project or Scratch directory, see the permissions page. For information on data recovery, see the Backups and Snapshots documentation. Warning Files stored in palmer_scratch are purged if they are older than 60 days. You will receive an email alert one week before they are deleted. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage. Partition Root Directory Storage File Count Backups Snapshots Notes home /vast/palmer/home.grace 125GiB/user 500,000 Yes >=2 days project /gpfs/gibbs/project 1TiB/group, increase to 4TiB on request 5,000,000 No >=2 days scratch /vast/palmer/scratch 10TiB/group 15,000,000 No No","title":"Storage"},{"location":"clusters/grace_rhel8/","text":"Grace Operating System Upgrade Grace's current operating system, Red Hat (RHEL) 7, will be offically end-of-life in 2024 and will no longer be supported with security patches by the developer. Therefore Grace has been upgraded to RHEL 8 during the August maintenance window, August 15-17, 2023. This provides a number of key benefits to Grace: consistency with the McCleary cluster continued security patches and support beyond 2023 updated system libraries to better support modern software improved node management system to facilitate the growing number of nodes on Grace shared application tree between McCleary and Grace, which brings software parity between the clusters* * some software and workflows will only be supported by YCRC staff on one of the cluster, e.g. tightly couple MPI codes (Grace) or RELION (McCleary). While we have done extensive testing both internally and with the new McCleary cluster, we recognize that there are a large number custom workflows on Grace that may need to be modified to work with the new operating system. To this end, we provided test partition ahead of the upgrade. Now that the upgrade has been rolled out cluster-wide, the test partitions (e.g. rhel8_day ) have been removed. All jobs should be submitted to the normal partitions, which now contain exclusively RHEL 8 nodes. New Host Key The ssh host key for Grace's login nodes were changed during the August maintenance, which will result in an error similar to the following when you attempt to login for the first time after the maintenance. @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! To access the cluster again, first remove the old host keys with the following command (if accessing the cluster via command line): ssh-keygen -R grace.hpc.yale.edu If you are using a GUI, such as MobaXterm, you will need to manually edit your known host file and remove the lines related to Grace. For MobaXterm, this file is located (by default) in Documents\\MobaXterm\\home\\.ssh . Then attempt a new login and accept the new host key. The valid host keys for the login nodes are as follows: 3072 SHA256:8jJ/dKJVntzBJQWW8pU901PHbWcIe2r8ACvq30zQxKU login1 (RSA) 256 SHA256:vhmGumY/XI/PAaheWQCadspl22/mqMiUiNXk+ov/zRc login1 (ECDSA) 256 SHA256:NWNrMNoLwcqMm+E2NpsKKmirSbku9iXgbfk8ucn5aZE login1 (ED25519) New Software Tree Grace now shares a software module tree with the McCleary cluster, providing a more consistent experience for all our users. Existing applications will continue to be available during this transition period. We plan to deprecate and remove the old application tree during the December 2023 maintenance window. If you experience any issues with software, please let us know at hpc@yale.edu and we can look into reinstalling. Common Errors Python not found Under RHEL8, we have only installed Python 3, which must be executed using python3 (not python ). As always, if you need additional packages, we strongly recommend setting up your own conda environment . In addition, Python 2.7 is no longer support and therefore not installed by default. To use Python 2.7, we request you setup a conda environment . Missing System Libraries Some of the existing applications may depend on libraries that are no longer installed in the operating system. If you run into these errors please email hpc@yale.edu and include which application/version you are using along with the full error message. We will investigate these on a case-by-case basis and work to get the issue resolved. There will be a small number of compute nodes reserved with RHEL7 (in a partition named legacy ) to enable work to continue while we resolve these issues. This partition will remain available until the December maintenance window. Warning Some of the applications in the new shared apps tree may not work perfectly on the legacy RHEL7 nodes. When running jobs in the legacy partition, you should therefore run module purge at the begining of interactive sessions and add it to the start of your batch scripts. This will ensure that you only load modules built for RHEL7. Report Issues If you continue to have or discover new issues with your workflow, feel free to contact us for assistance. Please include the working directory, the commands that were run, the software modules used, and any more information needed to reproduce the issue.","title":"Grace Operating System Upgrade"},{"location":"clusters/grace_rhel8/#grace-operating-system-upgrade","text":"Grace's current operating system, Red Hat (RHEL) 7, will be offically end-of-life in 2024 and will no longer be supported with security patches by the developer. Therefore Grace has been upgraded to RHEL 8 during the August maintenance window, August 15-17, 2023. This provides a number of key benefits to Grace: consistency with the McCleary cluster continued security patches and support beyond 2023 updated system libraries to better support modern software improved node management system to facilitate the growing number of nodes on Grace shared application tree between McCleary and Grace, which brings software parity between the clusters* * some software and workflows will only be supported by YCRC staff on one of the cluster, e.g. tightly couple MPI codes (Grace) or RELION (McCleary). While we have done extensive testing both internally and with the new McCleary cluster, we recognize that there are a large number custom workflows on Grace that may need to be modified to work with the new operating system. To this end, we provided test partition ahead of the upgrade. Now that the upgrade has been rolled out cluster-wide, the test partitions (e.g. rhel8_day ) have been removed. All jobs should be submitted to the normal partitions, which now contain exclusively RHEL 8 nodes.","title":"Grace Operating System Upgrade"},{"location":"clusters/grace_rhel8/#new-host-key","text":"The ssh host key for Grace's login nodes were changed during the August maintenance, which will result in an error similar to the following when you attempt to login for the first time after the maintenance. @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! To access the cluster again, first remove the old host keys with the following command (if accessing the cluster via command line): ssh-keygen -R grace.hpc.yale.edu If you are using a GUI, such as MobaXterm, you will need to manually edit your known host file and remove the lines related to Grace. For MobaXterm, this file is located (by default) in Documents\\MobaXterm\\home\\.ssh . Then attempt a new login and accept the new host key. The valid host keys for the login nodes are as follows: 3072 SHA256:8jJ/dKJVntzBJQWW8pU901PHbWcIe2r8ACvq30zQxKU login1 (RSA) 256 SHA256:vhmGumY/XI/PAaheWQCadspl22/mqMiUiNXk+ov/zRc login1 (ECDSA) 256 SHA256:NWNrMNoLwcqMm+E2NpsKKmirSbku9iXgbfk8ucn5aZE login1 (ED25519)","title":"New Host Key"},{"location":"clusters/grace_rhel8/#new-software-tree","text":"Grace now shares a software module tree with the McCleary cluster, providing a more consistent experience for all our users. Existing applications will continue to be available during this transition period. We plan to deprecate and remove the old application tree during the December 2023 maintenance window. If you experience any issues with software, please let us know at hpc@yale.edu and we can look into reinstalling.","title":"New Software Tree"},{"location":"clusters/grace_rhel8/#common-errors","text":"","title":"Common Errors"},{"location":"clusters/grace_rhel8/#python-not-found","text":"Under RHEL8, we have only installed Python 3, which must be executed using python3 (not python ). As always, if you need additional packages, we strongly recommend setting up your own conda environment . In addition, Python 2.7 is no longer support and therefore not installed by default. To use Python 2.7, we request you setup a conda environment .","title":"Python not found"},{"location":"clusters/grace_rhel8/#missing-system-libraries","text":"Some of the existing applications may depend on libraries that are no longer installed in the operating system. If you run into these errors please email hpc@yale.edu and include which application/version you are using along with the full error message. We will investigate these on a case-by-case basis and work to get the issue resolved. There will be a small number of compute nodes reserved with RHEL7 (in a partition named legacy ) to enable work to continue while we resolve these issues. This partition will remain available until the December maintenance window. Warning Some of the applications in the new shared apps tree may not work perfectly on the legacy RHEL7 nodes. When running jobs in the legacy partition, you should therefore run module purge at the begining of interactive sessions and add it to the start of your batch scripts. This will ensure that you only load modules built for RHEL7.","title":"Missing System Libraries"},{"location":"clusters/grace_rhel8/#report-issues","text":"If you continue to have or discover new issues with your workflow, feel free to contact us for assistance. Please include the working directory, the commands that were run, the software modules used, and any more information needed to reproduce the issue.","title":"Report Issues"},{"location":"clusters/maintenance/","text":"Cluster Maintenance Each YCRC cluster undergoes regular scheduled maintenance twice a year. During the maintenance, the cluster is unavailable, logins are deactivated and all pending jobs are held. Unless otherwise stated, the storage for that cluster will also be inaccessible during the maintenance. We use this opportunity when jobs are not running and there are no users on the machine to make upgrades and changes that would be disruptive. These activities include updating and patching the compute resources including the compute nodes, networking, service nodes and storage as well as making changes to critical infrastructure. Each maintenance is scheduled for three days, from Tuesday morning through end of day Thursday of the respective week. In many cases, the cluster may return to service early and, under extenuating circumstances, we may choose to extend maintenance if necessary to make sure the system is stable before restoring access and jobs. Communication will be sent to all users of the respective cluster both 4 weeks and 1 week prior to the maintenance period. Schedule The schedule for the regular cluster maintenance is posted below. Please be mindful of these dates and schedule your work accordingly to avoid disruptions. Date Cluster Dec 5-7 2023 Grace Feb 6-8 2024 Milgram Apr 2-4 2024 McCleary Jun 4-6 2024 Grace Aug 20-22 2024 Milgram Oct 1-3 2024 McCleary Dec 3-5 2024 Grace Occasionally we will schedule additional maintenance periods beyond those listed above, and potentially with shorter notices, if urgent work arises, such as power work on the data center or critical upgrades for stability or security. We will give as much notice as possible in advance of these maintenance outages.","title":"Cluster Maintenance"},{"location":"clusters/maintenance/#cluster-maintenance","text":"Each YCRC cluster undergoes regular scheduled maintenance twice a year. During the maintenance, the cluster is unavailable, logins are deactivated and all pending jobs are held. Unless otherwise stated, the storage for that cluster will also be inaccessible during the maintenance. We use this opportunity when jobs are not running and there are no users on the machine to make upgrades and changes that would be disruptive. These activities include updating and patching the compute resources including the compute nodes, networking, service nodes and storage as well as making changes to critical infrastructure. Each maintenance is scheduled for three days, from Tuesday morning through end of day Thursday of the respective week. In many cases, the cluster may return to service early and, under extenuating circumstances, we may choose to extend maintenance if necessary to make sure the system is stable before restoring access and jobs. Communication will be sent to all users of the respective cluster both 4 weeks and 1 week prior to the maintenance period.","title":"Cluster Maintenance"},{"location":"clusters/maintenance/#schedule","text":"The schedule for the regular cluster maintenance is posted below. Please be mindful of these dates and schedule your work accordingly to avoid disruptions. Date Cluster Dec 5-7 2023 Grace Feb 6-8 2024 Milgram Apr 2-4 2024 McCleary Jun 4-6 2024 Grace Aug 20-22 2024 Milgram Oct 1-3 2024 McCleary Dec 3-5 2024 Grace Occasionally we will schedule additional maintenance periods beyond those listed above, and potentially with shorter notices, if urgent work arises, such as power work on the data center or critical upgrades for stability or security. We will give as much notice as possible in advance of these maintenance outages.","title":"Schedule"},{"location":"clusters/mccleary-farnam-ruddle/","text":"McCleary for Farnam and Ruddle Users McCleary is the successor to both the Farnam and Ruddle clusters, which were retired in summer 2023. Key Dates Farnam April: Migration of purchased nodes and storage from Farnam to McCleary June 1st: Access to Farnam login and OnDemand nodes disabled Compute service charges on McCleary commons partitions begin July 13: /gpfs/ysm no longer be available Ruddle April: Migration of purchased nodes from Ruddle to McCleary June 1st: Official Farnam retirement date, and beginning of compute service charges on McCleary commons partitions. Jobs in the ycga partitions will always be exempt from compute service charge. July 24th: Access to Ruddle login and OnDemand nodes disabled. Old /gpfs/ycga replaced with new system. Accounts Most Farnam and Ruddle users who have been active in the last year have accounts automatically created on McCleary for them and have received an email to that effect. All other users who conduct life sciences research can request an account using our Account Request form . Group Membership Check which group your new McCleary account is associated with and make sure that matches your expection. This is the group that will be charged (if/when applicable) for your compute usage as well as dictate which private partitions you may have access to. Any cluster specific changes previously made on Farnam or Ruddle will not be automatically reflected on McCleary. To check, run the following command (replacing with your netid): sacctmgr show user If you need your group association changed, please let us know at hpc@yale.edu . Access Hostname McCleary can be accessed via SSH (or MobaXterm) at the hostname mccleary.ycrc.yale.edu . Transfers and transfer applications should be connected via transfer-mccleary.ycrc.yale.edu . Note The hostname does not use the domain hpc.yale.edu, but uses ycrc .yale.edu instead. Multifactor authentication via Duo is required for all users on McCleary, similar to how Ruddle is currently configured. This will be new to Farnam users. For most usage this additional step is minimally invasive and makes our clusters much more secure. However, for users who use graphical transfer tools such as Cyberduck, please see our MFA transfer documentation . Web Portal (Open OnDemand) McCleary web portal url is available at ood-mccleary.ycrc.yale.edu . On McCleary, you are limited to 4 interactive app instances (of any type) through the web portal at one time. Additional instances will remain pending in the queue until you terminate older open instances. Closing the window does not terminate the interactive app job. To terminate the job, click the \"Delete\" button in your \"My Interactive Apps\" page in the web portal. Note Again, the url does not use the domain hpc.yale.edu, but uses ycrc .yale.edu instead. Software We have installed most commonly used software modules from Farnam and Ruddle onto McCleary. Usage of modules on McCleary is similar to the other clusters (e.g. module avail , module load ). Some software may only be initially available in a newer version than was installed on Farnam or Ruddle. If you cannot find a software package on McCleary that you need, please let us know at hpc@yale.edu and we can look into installing it for you. Partition and Job Scheduler The most significant changes on transitioning from Farnam or Ruddle to McCleary is in respect to the partition scheme. McCleary uses the partition scheme used on the Grace and Milgram clusters, so should be familiar to users of those clusters. A full list of McCleary partitions can be found on the cluster page . Default Time Request The default walltime on McCleary is 1 hour on all partitions, down from 24 hours on Farnam and Ruddle. Use the -t flag to request a longer time limit. Changes to Partitions Below are notable changes to the partitions relative to Farnam and Ruddle. Many of these changes are reductions to maximum time request. If you job cannot run in the available partition time limits, please contact us at hpc@yale.edu so we can discuss your situation. general McCleary does not have a general partition, but instead has day and week partitions with maximum time limits of 24 hours and 7 days, respectively. The week partition contains significantly fewer nodes than day and will reject any job that request less than 24 hours of walltime, so please think carefully about how long your job needs to run for when selecting a partition. We strongly encourage checkpointing if it is an option or dividing up your workload into less than 24 hour chunks. This scheme promotes high turnover of compute resources and reduces the number of idle jobs, resulting in lower overall wait time. Interactive jobs are blocked from running in the day or week partitions. See the interactive partition below instead. day is the default partition for batch jobs (where your job goes if you do not specify a partition with -p or --partition ). interactive The interactive partition is called devel and contains a set of dedicated nodes specifically for development or interactive uses ( salloc jobs). To ensure high availability of resources, users are limited to one job at time. That job cannot request more than 6 hours, 4 cpus and 32G of memory. devel is the default partition for jobs started using salloc (where your job goes if you do not specify a partition with -p or --partition ). bigmem McCleary has a bigmem partition, but the maximum time request is now 24 hours. Jobs requesting less than 120G of RAM will be rejected from the partition and we ask you to submit those jobs to day . scavenge McCleary has a scavenge partition that operates in the same preemptable mode as before, but the maximum time request is now 24 hours. gpu_devel There is no gpu_devel on McCleary. We are evaluating the needs and potential solutions for interactive GPU-enabled jobs. For now, interactive GPU-enabled jobs should be submitted to the gpu partition. YCGA Compute YCGA researchers have access to a dedicated set of nodes totally over 3000 cores on McCleary that are prefixed with ycga . ycga : general purpose partition for batch jobs ycga_interactive : partition for interactive jobs (limit of 1 job at a time in this partition) ycga_bigmem : for jobs requiring large amount of RAM (>120G) Dedicated Nodes If you have purchased nodes on Farnam or Ruddle that are not in the haswell generation, we have coordinated with your group to migrate those nodes to McCleary in April into a partition of the same name. Storage and Data If you have data on the Gibbs filesystem, there was no action required as they are already available on McCleary. Farnam Data Farnam\u2019s primary filesystem, YSM (/gpfs/ysm), was retired on July 13th. If you previously had a Farnam account, you have been give new, empty home and scratch directories for McCleary on our Palmer filesystem and a 1 TiB project space on our Gibbs filesystem. Project quotas can be increased to 4 TiB at no cost by sending a request to hpc@yale.edu . Ruddle Data The YCGA storage system ( /gpfs/ycga ) has been replaced with a new, larger storage system at the same namespace. All data in the project (now at work ), sequencers , special , and pi directories under /gpfs/ycga were migrated by YCRC staff to the new storage system. All other data on /gpfs/ycga (Ruddle home and scratch60) was retired with Ruddle on July 24th. As a McCleary user, you have also been given new, empty home and scratch directories for McCleary on our Palmer filesystem and a 1 TiB project space on our Gibbs filesystem. Project quotas can be increased to 4 TiB at no cost by sending a request to hpc@yale.edu . Ruddle Project Data Data previously in /gpfs/ycga/project// can now be found at /gpfs/ycga/work// . The project symlink in your home directory links to your Gibbs project space, not your YCGA storage. Researchers with Purchased Storage If you have purchased space on /gpfs/ycga or /gpfs/ysm that has not expired, we have migrated your allocation. This is the only data that the YCRC automatically migrated from Farnam to McCleary. If you have purchased storage on /gpfs/ysm that has expired as of December 31st 2022, you should have received a separate communication from us with information on purchasing replacement storage on Gibbs (which is available on McCleary). If you have any questions or concerns about what has been moved to McCleary and when, please reach out to us. Storage@Yale (SAY) Shares Storage@Yale shares are available on McCleary, but only on the transfer node. To access your SAY data, make sure to login to the transfer node and then copy your data to either project or scratch . Note, this is different than how Ruddle was set up, where SAY shares were available on all nodes.","title":"McCleary for Farnam and Ruddle Users"},{"location":"clusters/mccleary-farnam-ruddle/#mccleary-for-farnam-and-ruddle-users","text":"McCleary is the successor to both the Farnam and Ruddle clusters, which were retired in summer 2023.","title":"McCleary for Farnam and Ruddle Users"},{"location":"clusters/mccleary-farnam-ruddle/#key-dates","text":"","title":"Key Dates"},{"location":"clusters/mccleary-farnam-ruddle/#farnam","text":"April: Migration of purchased nodes and storage from Farnam to McCleary June 1st: Access to Farnam login and OnDemand nodes disabled Compute service charges on McCleary commons partitions begin July 13: /gpfs/ysm no longer be available","title":"Farnam"},{"location":"clusters/mccleary-farnam-ruddle/#ruddle","text":"April: Migration of purchased nodes from Ruddle to McCleary June 1st: Official Farnam retirement date, and beginning of compute service charges on McCleary commons partitions. Jobs in the ycga partitions will always be exempt from compute service charge. July 24th: Access to Ruddle login and OnDemand nodes disabled. Old /gpfs/ycga replaced with new system.","title":"Ruddle"},{"location":"clusters/mccleary-farnam-ruddle/#accounts","text":"Most Farnam and Ruddle users who have been active in the last year have accounts automatically created on McCleary for them and have received an email to that effect. All other users who conduct life sciences research can request an account using our Account Request form . Group Membership Check which group your new McCleary account is associated with and make sure that matches your expection. This is the group that will be charged (if/when applicable) for your compute usage as well as dictate which private partitions you may have access to. Any cluster specific changes previously made on Farnam or Ruddle will not be automatically reflected on McCleary. To check, run the following command (replacing with your netid): sacctmgr show user If you need your group association changed, please let us know at hpc@yale.edu .","title":"Accounts"},{"location":"clusters/mccleary-farnam-ruddle/#access","text":"","title":"Access"},{"location":"clusters/mccleary-farnam-ruddle/#hostname","text":"McCleary can be accessed via SSH (or MobaXterm) at the hostname mccleary.ycrc.yale.edu . Transfers and transfer applications should be connected via transfer-mccleary.ycrc.yale.edu . Note The hostname does not use the domain hpc.yale.edu, but uses ycrc .yale.edu instead. Multifactor authentication via Duo is required for all users on McCleary, similar to how Ruddle is currently configured. This will be new to Farnam users. For most usage this additional step is minimally invasive and makes our clusters much more secure. However, for users who use graphical transfer tools such as Cyberduck, please see our MFA transfer documentation .","title":"Hostname"},{"location":"clusters/mccleary-farnam-ruddle/#web-portal-open-ondemand","text":"McCleary web portal url is available at ood-mccleary.ycrc.yale.edu . On McCleary, you are limited to 4 interactive app instances (of any type) through the web portal at one time. Additional instances will remain pending in the queue until you terminate older open instances. Closing the window does not terminate the interactive app job. To terminate the job, click the \"Delete\" button in your \"My Interactive Apps\" page in the web portal. Note Again, the url does not use the domain hpc.yale.edu, but uses ycrc .yale.edu instead.","title":"Web Portal (Open OnDemand)"},{"location":"clusters/mccleary-farnam-ruddle/#software","text":"We have installed most commonly used software modules from Farnam and Ruddle onto McCleary. Usage of modules on McCleary is similar to the other clusters (e.g. module avail , module load ). Some software may only be initially available in a newer version than was installed on Farnam or Ruddle. If you cannot find a software package on McCleary that you need, please let us know at hpc@yale.edu and we can look into installing it for you.","title":"Software"},{"location":"clusters/mccleary-farnam-ruddle/#partition-and-job-scheduler","text":"The most significant changes on transitioning from Farnam or Ruddle to McCleary is in respect to the partition scheme. McCleary uses the partition scheme used on the Grace and Milgram clusters, so should be familiar to users of those clusters. A full list of McCleary partitions can be found on the cluster page .","title":"Partition and Job Scheduler"},{"location":"clusters/mccleary-farnam-ruddle/#default-time-request","text":"The default walltime on McCleary is 1 hour on all partitions, down from 24 hours on Farnam and Ruddle. Use the -t flag to request a longer time limit.","title":"Default Time Request"},{"location":"clusters/mccleary-farnam-ruddle/#changes-to-partitions","text":"Below are notable changes to the partitions relative to Farnam and Ruddle. Many of these changes are reductions to maximum time request. If you job cannot run in the available partition time limits, please contact us at hpc@yale.edu so we can discuss your situation.","title":"Changes to Partitions"},{"location":"clusters/mccleary-farnam-ruddle/#general","text":"McCleary does not have a general partition, but instead has day and week partitions with maximum time limits of 24 hours and 7 days, respectively. The week partition contains significantly fewer nodes than day and will reject any job that request less than 24 hours of walltime, so please think carefully about how long your job needs to run for when selecting a partition. We strongly encourage checkpointing if it is an option or dividing up your workload into less than 24 hour chunks. This scheme promotes high turnover of compute resources and reduces the number of idle jobs, resulting in lower overall wait time. Interactive jobs are blocked from running in the day or week partitions. See the interactive partition below instead. day is the default partition for batch jobs (where your job goes if you do not specify a partition with -p or --partition ).","title":"general"},{"location":"clusters/mccleary-farnam-ruddle/#interactive","text":"The interactive partition is called devel and contains a set of dedicated nodes specifically for development or interactive uses ( salloc jobs). To ensure high availability of resources, users are limited to one job at time. That job cannot request more than 6 hours, 4 cpus and 32G of memory. devel is the default partition for jobs started using salloc (where your job goes if you do not specify a partition with -p or --partition ).","title":"interactive"},{"location":"clusters/mccleary-farnam-ruddle/#bigmem","text":"McCleary has a bigmem partition, but the maximum time request is now 24 hours. Jobs requesting less than 120G of RAM will be rejected from the partition and we ask you to submit those jobs to day .","title":"bigmem"},{"location":"clusters/mccleary-farnam-ruddle/#scavenge","text":"McCleary has a scavenge partition that operates in the same preemptable mode as before, but the maximum time request is now 24 hours.","title":"scavenge"},{"location":"clusters/mccleary-farnam-ruddle/#gpu_devel","text":"There is no gpu_devel on McCleary. We are evaluating the needs and potential solutions for interactive GPU-enabled jobs. For now, interactive GPU-enabled jobs should be submitted to the gpu partition.","title":"gpu_devel"},{"location":"clusters/mccleary-farnam-ruddle/#ycga-compute","text":"YCGA researchers have access to a dedicated set of nodes totally over 3000 cores on McCleary that are prefixed with ycga . ycga : general purpose partition for batch jobs ycga_interactive : partition for interactive jobs (limit of 1 job at a time in this partition) ycga_bigmem : for jobs requiring large amount of RAM (>120G)","title":"YCGA Compute"},{"location":"clusters/mccleary-farnam-ruddle/#dedicated-nodes","text":"If you have purchased nodes on Farnam or Ruddle that are not in the haswell generation, we have coordinated with your group to migrate those nodes to McCleary in April into a partition of the same name.","title":"Dedicated Nodes"},{"location":"clusters/mccleary-farnam-ruddle/#storage-and-data","text":"If you have data on the Gibbs filesystem, there was no action required as they are already available on McCleary.","title":"Storage and Data"},{"location":"clusters/mccleary-farnam-ruddle/#farnam-data","text":"Farnam\u2019s primary filesystem, YSM (/gpfs/ysm), was retired on July 13th. If you previously had a Farnam account, you have been give new, empty home and scratch directories for McCleary on our Palmer filesystem and a 1 TiB project space on our Gibbs filesystem. Project quotas can be increased to 4 TiB at no cost by sending a request to hpc@yale.edu .","title":"Farnam Data"},{"location":"clusters/mccleary-farnam-ruddle/#ruddle-data","text":"The YCGA storage system ( /gpfs/ycga ) has been replaced with a new, larger storage system at the same namespace. All data in the project (now at work ), sequencers , special , and pi directories under /gpfs/ycga were migrated by YCRC staff to the new storage system. All other data on /gpfs/ycga (Ruddle home and scratch60) was retired with Ruddle on July 24th. As a McCleary user, you have also been given new, empty home and scratch directories for McCleary on our Palmer filesystem and a 1 TiB project space on our Gibbs filesystem. Project quotas can be increased to 4 TiB at no cost by sending a request to hpc@yale.edu . Ruddle Project Data Data previously in /gpfs/ycga/project// can now be found at /gpfs/ycga/work// . The project symlink in your home directory links to your Gibbs project space, not your YCGA storage.","title":"Ruddle Data"},{"location":"clusters/mccleary-farnam-ruddle/#researchers-with-purchased-storage","text":"If you have purchased space on /gpfs/ycga or /gpfs/ysm that has not expired, we have migrated your allocation. This is the only data that the YCRC automatically migrated from Farnam to McCleary. If you have purchased storage on /gpfs/ysm that has expired as of December 31st 2022, you should have received a separate communication from us with information on purchasing replacement storage on Gibbs (which is available on McCleary). If you have any questions or concerns about what has been moved to McCleary and when, please reach out to us.","title":"Researchers with Purchased Storage"},{"location":"clusters/mccleary-farnam-ruddle/#storageyale-say-shares","text":"Storage@Yale shares are available on McCleary, but only on the transfer node. To access your SAY data, make sure to login to the transfer node and then copy your data to either project or scratch . Note, this is different than how Ruddle was set up, where SAY shares were available on all nodes.","title":"Storage@Yale (SAY) Shares"},{"location":"clusters/mccleary/","text":"McCleary McCleary is a shared-use resource for the Yale School of Medicine (YSM), life science researchers elsewhere on campus and projects related to the Yale Center for Genome Analysis . It consists of a variety of compute nodes networked over ethernet and mounts several shared filesystems. McCleary is named for Beatrix McCleary Hamburg , who received her medical degree in 1948 and was the first female African American graduate of Yale School of Medicine. The McCleary HPC cluster is Yale's first direct-to-chip liquid cooled cluster, moving the YCRC and the Yale research computing community into a more environmentally friendly future. Info Farnam or Ruddle user? Farnam and Ruddle were both retired in summer 2023. See our explainer for what you need to know about using McCleary and how it differs from Farnam and Ruddle. Access the Cluster Once you have an account , the cluster can be accessed via ssh or through the Open OnDemand web portal . Warning You are limited to 4 interactive app instances (of any type) at one time. Additional instances will be rejected until you delete older open instances. For OnDemand jobs, closing the window does not terminate the interactive app job. To terminate the job, click the \"Delete\" button in your \"My Interactive Apps\" page in the web portal. System Status and Monitoring For system status messages and the schedule for upcoming maintenance, please see the system status page . For a current node-level view of job activity, see the cluster monitor page (VPN only) . Partitions and Hardware McCleary is made up of several kinds of compute nodes. We group them into (sometimes overlapping) Slurm partitions meant to serve different purposes. By combining the --partition and --constraint Slurm options you can more finely control what nodes your jobs can run on. Info YCGA sequence data user? To avoid being charged for your cpu usage for YCGA-related work, make sure to submit jobs to the ycga partition with -p ycga. Job Submission Rate Limits Job submissions are limited to 200 jobs per hour . See the Rate Limits section in the Common Job Failures page for more info. Public Partitions See each tab below for more information about the available common use partitions. day Use the day partition for most batch jobs. This is the default if you don't specify one with --partition . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the day partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per group 512 Maximum memory per group 6000G Maximum CPUs per user 256 Maximum memory per user 3000G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 26 8358 64 983 icelake, avx512, 8358, nogpu, bigtmp, common 15 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, common devel Use the devel partition to jobs with which you need ongoing interaction. For example, exploratory analyses or debugging compilation. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the devel partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 4 Maximum memory per user 32G Maximum submitted jobs per user 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 10 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, common week Use the week partition for jobs that need a longer runtime than day allows. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the week partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Maximum CPUs per group 192 Maximum memory per group 2949G Maximum CPUs per user 192 Maximum memory per user 2949G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 14 8358 64 983 icelake, avx512, 8358, nogpu, bigtmp, common 2 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, common long Use the long partition for jobs that need a longer runtime than week allows. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=7-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the long partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Maximum CPUs per group 36 Maximum CPUs per user 36 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, common transfer Use the transfer partition to stage data for your jobs to and from cluster storage . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the transfer partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 1 Maximum running jobs per user 2 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 72 8 227 milan, 72F3, nogpu, standard, common gpu Use the gpu partition for jobs that make use of GPUs. You must request GPUs explicitly with the --gpus option in order to use them. For example, --gpus=gtx1080ti:2 would request 2 GeForce GTX 1080Ti GPUs per node. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the gpu partition are subject to the following limits: Limit Value Maximum job time limit 2-00:00:00 Maximum GPUs per group 24 Maximum GPUs per user 12 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 14 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, common, a100-80g 1 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 6 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 3 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 1 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, common, bigtmp, a100 9 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, oldest, common, gtx1080ti gpu_devel Use the gpu_devel partition to debug jobs that make use of GPUs, or to develop GPU-enabled code. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the gpu_devel partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 4 Maximum memory per user 32G Maximum submitted jobs per user 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 E5-2623_v4 8 38 gtx1080ti 4 11 broadwell, E5-2623_v4, singleprecision, common, gtx1080ti bigmem Use the bigmem partition for jobs that have memory requirements other partitions can't handle. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the bigmem partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 32 Maximum memory per user 3960G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 4 6346 32 3960 icelake, avx512, 6346, nogpu, bigtmp, common 2 6234 16 1486 cascadelake, avx512, 6234, nogpu, common, bigtmp 3 6240 36 1486 cascadelake, avx512, 6240, nogpu, pi, bigtmp scavenge Use the scavenge partition to run preemptable jobs on more resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the scavenge partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 1000 Maximum memory per user 20000G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 48 8362 64 479 icelake, avx512, 8362, nogpu, standard, pi 20 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, pi, a100 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, common, a100-80g 4 6346 32 1991 icelake, avx512, 6346, nogpu, pi 4 6326 32 479 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 4 6346 32 3960 icelake, avx512, 6346, nogpu, bigtmp, common 40 8358 64 983 icelake, avx512, 8358, nogpu, bigtmp, common 4 6240 36 730 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 42 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 4 6240 36 352 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 9 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 2 6240 36 167 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 19 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, common 1 6242 32 981 rtx8000 2 48 cascadelake, avx512, 6242, doubleprecision, pi, bigtmp, rtx8000 1 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 6 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 10 8268 48 352 cascadelake, avx512, 8268, nogpu, bigtmp, pi 2 6240 36 163 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 1 6248r 48 352 cascadelake, avx512, 6248r, nogpu, pi, bigtmp 2 6234 16 1486 cascadelake, avx512, 6234, nogpu, common, bigtmp 1 6240 36 352 v100 4 16 cascadelake, avx512, 6240, pi, v100 3 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 1 6240 36 352 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 1 6420 36 730 a100 4 40 cascadelake, avx512, 6420, doubleprecision, pi, bigtmp, a100 1 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, common, bigtmp, a100 1 6240 36 163 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 6 6240 36 1486 cascadelake, avx512, 6240, nogpu, pi, bigtmp 1 6226r 32 163 rtx3090 4 24 cascadelake, avx512, 6226r, doubleprecision, pi, rtx3090 8 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, pi, bigtmp, rtx5000 2 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, a100 1 6240 36 163 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 2 6132 28 163 skylake, avx512, 6132, nogpu, standard, bigtmp, pi 1 6132 28 730 skylake, avx512, 6132, nogpu, standard, bigtmp, pi 2 5122 8 163 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 39 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi 1 E7-4820_v4 40 1486 broadwell, E7-4820_v4, nogpu, pi 9 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, oldest, common, gtx1080ti 3 E5-2660_v4 28 227 p100 2 16 broadwell, E5-2660_v4, doubleprecision, pi, p100 1 E5-2637_v4 8 101 titanv 4 12 broadwell, E5-2637_v4, doubleprecision, pi, bigtmp, titanv 11 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, gtx1080ti scavenge_gpu Use the scavenge_gpu partition to run preemptable jobs on more GPU resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the scavenge_gpu partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum GPUs per group 100 Maximum GPUs per user 64 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 20 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, pi, a100 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, common, a100-80g 4 6326 32 479 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 1 6242 32 981 rtx8000 2 48 cascadelake, avx512, 6242, doubleprecision, pi, bigtmp, rtx8000 1 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 6 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 2 6240 36 163 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 1 6240 36 352 v100 4 16 cascadelake, avx512, 6240, pi, v100 3 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 1 6240 36 352 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 1 6420 36 730 a100 4 40 cascadelake, avx512, 6420, doubleprecision, pi, bigtmp, a100 1 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, common, bigtmp, a100 1 6240 36 163 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 1 6226r 32 163 rtx3090 4 24 cascadelake, avx512, 6226r, doubleprecision, pi, rtx3090 8 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, pi, bigtmp, rtx5000 2 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, a100 1 6240 36 163 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 2 5122 8 163 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 9 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, oldest, common, gtx1080ti 3 E5-2660_v4 28 227 p100 2 16 broadwell, E5-2660_v4, doubleprecision, pi, p100 1 E5-2637_v4 8 101 titanv 4 12 broadwell, E5-2637_v4, doubleprecision, pi, bigtmp, titanv 11 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, gtx1080ti Private Partitions With few exceptions, jobs submitted to private partitions are not considered when calculating your group's Fairshare . Your group can purchase additional hardware for private use, which we will make available as a pi_groupname partition. These nodes are purchased by you, but supported and administered by us. After vendor support expires, we retire compute nodes. Compute nodes can range from $10K to upwards of $50K depending on your requirements. If you are interested in purchasing nodes for your group, please contact us . PI Partitions (click to expand) pi_breaker Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_breaker partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 23 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi pi_bunick Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_bunick partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, a100 pi_butterwick Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_butterwick partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, a100 pi_chenlab Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_chenlab partition are subject to the following limits: Limit Value Maximum job time limit 14-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 352 cascadelake, avx512, 8268, nogpu, bigtmp, pi pi_cryo_realtime Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_cryo_realtime partition are subject to the following limits: Limit Value Maximum job time limit 14-00:00:00 Maximum GPUs per user 12 Maximum running jobs per user 2 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, gtx1080ti pi_cryoem Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_cryoem partition are subject to the following limits: Limit Value Maximum job time limit 4-00:00:00 Maximum CPUs per user 32 Maximum GPUs per user 12 Maximum running jobs per user 2 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 6 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000 9 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, gtx1080ti pi_deng Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_deng partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 E5-2660_v4 28 227 p100 2 16 broadwell, E5-2660_v4, doubleprecision, pi, p100 pi_dewan Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_dewan partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_dijk Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_dijk partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 352 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 pi_dunn Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_dunn partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_edwards Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_edwards partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_falcone Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_falcone partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 1 6240 36 352 v100 4 16 cascadelake, avx512, 6240, pi, v100 1 6240 36 1486 cascadelake, avx512, 6240, nogpu, pi, bigtmp pi_galvani Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_galvani partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 7 8268 48 352 cascadelake, avx512, 8268, nogpu, bigtmp, pi pi_gerstein Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_gerstein partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6132 28 163 skylake, avx512, 6132, nogpu, standard, bigtmp, pi 1 6132 28 730 skylake, avx512, 6132, nogpu, standard, bigtmp, pi 11 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi 1 E7-4820_v4 40 1486 broadwell, E7-4820_v4, nogpu, pi pi_gerstein_gpu Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_gerstein_gpu partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, pi, a100 1 6240 36 163 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 1 6240 36 163 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 2 E5-2660_v4 28 227 p100 2 16 broadwell, E5-2660_v4, doubleprecision, pi, p100 1 E5-2637_v4 8 101 titanv 4 12 broadwell, E5-2637_v4, doubleprecision, pi, bigtmp, titanv pi_gruen Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_gruen partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi pi_hall Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_hall partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 40 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_hall_bigmem Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_hall_bigmem partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 1486 cascadelake, avx512, 6240, nogpu, pi, bigtmp pi_jadi Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_jadi partition are subject to the following limits: Limit Value Maximum job time limit 365-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi pi_jetz Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_jetz partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 8358 64 1991 icelake, avx512, 8358, nogpu, bigtmp, pi 4 6240 36 730 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 4 6240 36 352 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_kleinstein Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_kleinstein partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_krishnaswamy Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_krishnaswamy partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6420 36 730 a100 4 40 cascadelake, avx512, 6420, doubleprecision, pi, bigtmp, a100 pi_ma Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_ma partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 352 cascadelake, avx512, 8268, nogpu, bigtmp, pi pi_medzhitov Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_medzhitov partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 167 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_miranker Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_miranker partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6248r 48 352 cascadelake, avx512, 6248r, nogpu, pi, bigtmp pi_ohern Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_ohern partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi pi_reinisch Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_reinisch partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 5122 8 163 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 pi_sestan Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_sestan partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 8358 64 1991 icelake, avx512, 8358, nogpu, bigtmp, pi pi_sigworth Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_sigworth partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 163 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti pi_sindelar Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_sindelar partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 163 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 1 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, gtx1080ti pi_tomography Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_tomography partition are subject to the following limits: Limit Value Maximum job time limit 4-00:00:00 Maximum CPUs per user 32 Maximum GPUs per user 24 Maximum running jobs per user 2 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6242 32 981 rtx8000 2 48 cascadelake, avx512, 6242, doubleprecision, pi, bigtmp, rtx8000 8 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, pi, bigtmp, rtx5000 pi_townsend Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_townsend partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_tsang Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_tsang partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 4 8358 64 983 icelake, avx512, 8358, nogpu, bigtmp, pi pi_ya-chi_ho Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_ya-chi_ho partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 352 cascadelake, avx512, 8268, nogpu, bigtmp, pi pi_yong_xiong Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_yong_xiong partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 4 6326 32 479 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 1 6226r 32 163 rtx3090 4 24 cascadelake, avx512, 6226r, doubleprecision, pi, rtx3090 pi_zhao Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_zhao partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi YCGA Partitions The following partitions are intended for projects related to the Yale Center for Genome Analysis . Please do not use these partitions for other proejcts. Access is granted on a group basis. If you need access to these partitions, please contact us to get approved and added. YCGA Partitions (click to expand) ycga Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the ycga partition are subject to the following limits: Limit Value Maximum job time limit 2-00:00:00 Maximum CPUs per group 512 Maximum memory per group 3934G Maximum CPUs per user 256 Maximum memory per user 1916G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 40 8362 64 479 icelake, avx512, 8362, nogpu, standard, pi ycga_admins Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 8362 64 479 icelake, avx512, 8362, nogpu, standard, pi ycga_bigmem Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the ycga_bigmem partition are subject to the following limits: Limit Value Maximum job time limit 4-00:00:00 Maximum CPUs per user 64 Maximum memory per user 1991G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 4 6346 32 1991 icelake, avx512, 6346, nogpu, pi ycga_long Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the ycga_long partition are subject to the following limits: Limit Value Maximum job time limit 14-00:00:00 Maximum CPUs per group 64 Maximum memory per group 479G Maximum CPUs per user 32 Maximum memory per user 239G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 6 8362 64 479 icelake, avx512, 8362, nogpu, standard, pi Public Datasets We host datasets of general interest in a loosely organized directory tree in /gpfs/gibbs/data : \u251c\u2500\u2500 alphafold-2.3 \u251c\u2500\u2500 alphafold-2.2 (deprecated) \u251c\u2500\u2500 alphafold-2.0 (deprecated) \u251c\u2500\u2500 annovar \u2502 \u2514\u2500\u2500 humandb \u251c\u2500\u2500 cryoem \u251c\u2500\u2500 db \u2502 \u251c\u2500\u2500 annovar \u2502 \u251c\u2500\u2500 blast \u2502 \u251c\u2500\u2500 busco \u2502 \u2514\u2500\u2500 Pfam \u2514\u2500\u2500 genomes \u251c\u2500\u2500 1000Genomes \u251c\u2500\u2500 10xgenomics \u251c\u2500\u2500 Aedes_aegypti \u251c\u2500\u2500 Bos_taurus \u251c\u2500\u2500 Chelonoidis_nigra \u251c\u2500\u2500 Danio_rerio \u251c\u2500\u2500 Drosophila_melanogaster \u251c\u2500\u2500 Gallus_gallus \u251c\u2500\u2500 hisat2 \u251c\u2500\u2500 Homo_sapiens \u251c\u2500\u2500 Macaca_mulatta \u251c\u2500\u2500 Mus_musculus \u251c\u2500\u2500 Monodelphis_domestica \u251c\u2500\u2500 PhiX \u2514\u2500\u2500 Saccharomyces_cerevisiae \u2514\u2500\u2500 tmp \u2514\u2500\u2500 hisat2 \u2514\u2500\u2500 mouse If you would like us to host a dataset or questions about what is currently available, please contact us . YCGA Data Data associated with YCGA projects and sequenceers are located on the YCGA storage system, accessible at /gpfs/ycga . For more information on accessing this data as well as sequencing data retention polices, see the YCGA Data documentation . Storage McCleary has access to a number of GPFS filesystems. /vast/palmer is McCleary's primary filesystem where Home and Scratch60 directories are located. Every group on McCleary also has access to a Project allocation on the Gibbs filesytem on /gpfs/gibbs . For more details on the different storage spaces, see our Cluster Storage documentation. You can check your current storage usage & limits by running the getquota command. Your ~/project and ~/palmer_scratch directories are shortcuts. Get a list of the absolute paths to your directories with the mydirectories command. If you want to share data in your Project or Scratch directory, see the permissions page. For information on data recovery, see the Backups and Snapshots documentation. Warning Files stored in palmer_scratch are purged if they are older than 60 days. You will receive an email alert one week before they are deleted. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage. Partition Root Directory Storage File Count Backups Snapshots home /vast/palmer/home.mccleary 125GiB/user 500,000 Yes >=2 days project /gpfs/gibbs/project 1TiB/group, increase to 4TiB on request 5,000,000 No >=2 days scratch /vast/palmer/scratch 10TiB/group 15,000,000 No No","title":"McCleary"},{"location":"clusters/mccleary/#mccleary","text":"McCleary is a shared-use resource for the Yale School of Medicine (YSM), life science researchers elsewhere on campus and projects related to the Yale Center for Genome Analysis . It consists of a variety of compute nodes networked over ethernet and mounts several shared filesystems. McCleary is named for Beatrix McCleary Hamburg , who received her medical degree in 1948 and was the first female African American graduate of Yale School of Medicine. The McCleary HPC cluster is Yale's first direct-to-chip liquid cooled cluster, moving the YCRC and the Yale research computing community into a more environmentally friendly future. Info Farnam or Ruddle user? Farnam and Ruddle were both retired in summer 2023. See our explainer for what you need to know about using McCleary and how it differs from Farnam and Ruddle.","title":"McCleary"},{"location":"clusters/mccleary/#access-the-cluster","text":"Once you have an account , the cluster can be accessed via ssh or through the Open OnDemand web portal . Warning You are limited to 4 interactive app instances (of any type) at one time. Additional instances will be rejected until you delete older open instances. For OnDemand jobs, closing the window does not terminate the interactive app job. To terminate the job, click the \"Delete\" button in your \"My Interactive Apps\" page in the web portal.","title":"Access the Cluster"},{"location":"clusters/mccleary/#system-status-and-monitoring","text":"For system status messages and the schedule for upcoming maintenance, please see the system status page . For a current node-level view of job activity, see the cluster monitor page (VPN only) .","title":"System Status and Monitoring"},{"location":"clusters/mccleary/#partitions-and-hardware","text":"McCleary is made up of several kinds of compute nodes. We group them into (sometimes overlapping) Slurm partitions meant to serve different purposes. By combining the --partition and --constraint Slurm options you can more finely control what nodes your jobs can run on. Info YCGA sequence data user? To avoid being charged for your cpu usage for YCGA-related work, make sure to submit jobs to the ycga partition with -p ycga. Job Submission Rate Limits Job submissions are limited to 200 jobs per hour . See the Rate Limits section in the Common Job Failures page for more info.","title":"Partitions and Hardware"},{"location":"clusters/mccleary/#public-partitions","text":"See each tab below for more information about the available common use partitions. day Use the day partition for most batch jobs. This is the default if you don't specify one with --partition . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the day partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per group 512 Maximum memory per group 6000G Maximum CPUs per user 256 Maximum memory per user 3000G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 26 8358 64 983 icelake, avx512, 8358, nogpu, bigtmp, common 15 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, common devel Use the devel partition to jobs with which you need ongoing interaction. For example, exploratory analyses or debugging compilation. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the devel partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 4 Maximum memory per user 32G Maximum submitted jobs per user 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 10 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, common week Use the week partition for jobs that need a longer runtime than day allows. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the week partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Maximum CPUs per group 192 Maximum memory per group 2949G Maximum CPUs per user 192 Maximum memory per user 2949G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 14 8358 64 983 icelake, avx512, 8358, nogpu, bigtmp, common 2 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, common long Use the long partition for jobs that need a longer runtime than week allows. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=7-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the long partition are subject to the following limits: Limit Value Maximum job time limit 28-00:00:00 Maximum CPUs per group 36 Maximum CPUs per user 36 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, common transfer Use the transfer partition to stage data for your jobs to and from cluster storage . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the transfer partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 1 Maximum running jobs per user 2 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 72 8 227 milan, 72F3, nogpu, standard, common gpu Use the gpu partition for jobs that make use of GPUs. You must request GPUs explicitly with the --gpus option in order to use them. For example, --gpus=gtx1080ti:2 would request 2 GeForce GTX 1080Ti GPUs per node. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the gpu partition are subject to the following limits: Limit Value Maximum job time limit 2-00:00:00 Maximum GPUs per group 24 Maximum GPUs per user 12 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 14 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, common, a100-80g 1 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 6 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 3 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 1 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, common, bigtmp, a100 9 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, oldest, common, gtx1080ti gpu_devel Use the gpu_devel partition to debug jobs that make use of GPUs, or to develop GPU-enabled code. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the gpu_devel partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 4 Maximum memory per user 32G Maximum submitted jobs per user 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 E5-2623_v4 8 38 gtx1080ti 4 11 broadwell, E5-2623_v4, singleprecision, common, gtx1080ti bigmem Use the bigmem partition for jobs that have memory requirements other partitions can't handle. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the bigmem partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 32 Maximum memory per user 3960G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 4 6346 32 3960 icelake, avx512, 6346, nogpu, bigtmp, common 2 6234 16 1486 cascadelake, avx512, 6234, nogpu, common, bigtmp 3 6240 36 1486 cascadelake, avx512, 6240, nogpu, pi, bigtmp scavenge Use the scavenge partition to run preemptable jobs on more resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the scavenge partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 1000 Maximum memory per user 20000G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 48 8362 64 479 icelake, avx512, 8362, nogpu, standard, pi 20 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, pi, a100 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, common, a100-80g 4 6346 32 1991 icelake, avx512, 6346, nogpu, pi 4 6326 32 479 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 4 6346 32 3960 icelake, avx512, 6346, nogpu, bigtmp, common 40 8358 64 983 icelake, avx512, 8358, nogpu, bigtmp, common 4 6240 36 730 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 42 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 4 6240 36 352 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 9 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 2 6240 36 167 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 19 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, common 1 6242 32 981 rtx8000 2 48 cascadelake, avx512, 6242, doubleprecision, pi, bigtmp, rtx8000 1 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 6 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 10 8268 48 352 cascadelake, avx512, 8268, nogpu, bigtmp, pi 2 6240 36 163 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 1 6248r 48 352 cascadelake, avx512, 6248r, nogpu, pi, bigtmp 2 6234 16 1486 cascadelake, avx512, 6234, nogpu, common, bigtmp 1 6240 36 352 v100 4 16 cascadelake, avx512, 6240, pi, v100 3 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 1 6240 36 352 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 1 6420 36 730 a100 4 40 cascadelake, avx512, 6420, doubleprecision, pi, bigtmp, a100 1 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, common, bigtmp, a100 1 6240 36 163 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 6 6240 36 1486 cascadelake, avx512, 6240, nogpu, pi, bigtmp 1 6226r 32 163 rtx3090 4 24 cascadelake, avx512, 6226r, doubleprecision, pi, rtx3090 8 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, pi, bigtmp, rtx5000 2 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, a100 1 6240 36 163 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 2 6132 28 163 skylake, avx512, 6132, nogpu, standard, bigtmp, pi 1 6132 28 730 skylake, avx512, 6132, nogpu, standard, bigtmp, pi 2 5122 8 163 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 39 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi 1 E7-4820_v4 40 1486 broadwell, E7-4820_v4, nogpu, pi 9 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, oldest, common, gtx1080ti 3 E5-2660_v4 28 227 p100 2 16 broadwell, E5-2660_v4, doubleprecision, pi, p100 1 E5-2637_v4 8 101 titanv 4 12 broadwell, E5-2637_v4, doubleprecision, pi, bigtmp, titanv 11 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, gtx1080ti scavenge_gpu Use the scavenge_gpu partition to run preemptable jobs on more GPU resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the scavenge_gpu partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum GPUs per group 100 Maximum GPUs per user 64 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 20 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, pi, a100 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, common, a100-80g 4 6326 32 479 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 1 6242 32 981 rtx8000 2 48 cascadelake, avx512, 6242, doubleprecision, pi, bigtmp, rtx8000 1 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 6 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 2 6240 36 163 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 1 6240 36 352 v100 4 16 cascadelake, avx512, 6240, pi, v100 3 5222 8 163 rtx3090 4 24 cascadelake, avx512, 5222, doubleprecision, common, rtx3090 1 6240 36 352 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 1 6420 36 730 a100 4 40 cascadelake, avx512, 6420, doubleprecision, pi, bigtmp, a100 1 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, common, bigtmp, a100 1 6240 36 163 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 1 6226r 32 163 rtx3090 4 24 cascadelake, avx512, 6226r, doubleprecision, pi, rtx3090 8 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, pi, bigtmp, rtx5000 2 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, a100 1 6240 36 163 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 2 5122 8 163 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 9 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, oldest, common, gtx1080ti 3 E5-2660_v4 28 227 p100 2 16 broadwell, E5-2660_v4, doubleprecision, pi, p100 1 E5-2637_v4 8 101 titanv 4 12 broadwell, E5-2637_v4, doubleprecision, pi, bigtmp, titanv 11 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, gtx1080ti","title":"Public Partitions"},{"location":"clusters/mccleary/#private-partitions","text":"With few exceptions, jobs submitted to private partitions are not considered when calculating your group's Fairshare . Your group can purchase additional hardware for private use, which we will make available as a pi_groupname partition. These nodes are purchased by you, but supported and administered by us. After vendor support expires, we retire compute nodes. Compute nodes can range from $10K to upwards of $50K depending on your requirements. If you are interested in purchasing nodes for your group, please contact us . PI Partitions (click to expand) pi_breaker Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_breaker partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 23 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi pi_bunick Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_bunick partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, a100 pi_butterwick Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_butterwick partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 352 a100 4 40 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, a100 pi_chenlab Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_chenlab partition are subject to the following limits: Limit Value Maximum job time limit 14-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 352 cascadelake, avx512, 8268, nogpu, bigtmp, pi pi_cryo_realtime Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_cryo_realtime partition are subject to the following limits: Limit Value Maximum job time limit 14-00:00:00 Maximum GPUs per user 12 Maximum running jobs per user 2 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, gtx1080ti pi_cryoem Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_cryoem partition are subject to the following limits: Limit Value Maximum job time limit 4-00:00:00 Maximum CPUs per user 32 Maximum GPUs per user 12 Maximum running jobs per user 2 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 6 6326 32 206 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000 9 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, gtx1080ti pi_deng Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_deng partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 E5-2660_v4 28 227 p100 2 16 broadwell, E5-2660_v4, doubleprecision, pi, p100 pi_dewan Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_dewan partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_dijk Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_dijk partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 352 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 pi_dunn Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_dunn partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_edwards Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_edwards partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_falcone Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_falcone partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 1 6240 36 352 v100 4 16 cascadelake, avx512, 6240, pi, v100 1 6240 36 1486 cascadelake, avx512, 6240, nogpu, pi, bigtmp pi_galvani Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_galvani partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 7 8268 48 352 cascadelake, avx512, 8268, nogpu, bigtmp, pi pi_gerstein Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_gerstein partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6132 28 163 skylake, avx512, 6132, nogpu, standard, bigtmp, pi 1 6132 28 730 skylake, avx512, 6132, nogpu, standard, bigtmp, pi 11 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi 1 E7-4820_v4 40 1486 broadwell, E7-4820_v4, nogpu, pi pi_gerstein_gpu Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_gerstein_gpu partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 8358 64 983 a100 4 40 icelake, avx512, 8358, doubleprecision, bigtmp, pi, a100 1 6240 36 163 rtx3090 8 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 1 6240 36 163 rtx3090 4 24 cascadelake, avx512, 6240, doubleprecision, pi, bigtmp, rtx3090 2 E5-2660_v4 28 227 p100 2 16 broadwell, E5-2660_v4, doubleprecision, pi, p100 1 E5-2637_v4 8 101 titanv 4 12 broadwell, E5-2637_v4, doubleprecision, pi, bigtmp, titanv pi_gruen Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_gruen partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi pi_hall Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_hall partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 40 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_hall_bigmem Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_hall_bigmem partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 1486 cascadelake, avx512, 6240, nogpu, pi, bigtmp pi_jadi Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_jadi partition are subject to the following limits: Limit Value Maximum job time limit 365-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi pi_jetz Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_jetz partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 8358 64 1991 icelake, avx512, 8358, nogpu, bigtmp, pi 4 6240 36 730 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi 4 6240 36 352 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_kleinstein Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_kleinstein partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_krishnaswamy Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_krishnaswamy partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6420 36 730 a100 4 40 cascadelake, avx512, 6420, doubleprecision, pi, bigtmp, a100 pi_ma Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_ma partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 352 cascadelake, avx512, 8268, nogpu, bigtmp, pi pi_medzhitov Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_medzhitov partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 167 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_miranker Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_miranker partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 6248r 48 352 cascadelake, avx512, 6248r, nogpu, pi, bigtmp pi_ohern Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_ohern partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 E5-2680_v4 28 227 broadwell, E5-2680_v4, nogpu, standard, oldest, pi pi_reinisch Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_reinisch partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 5122 8 163 rtx2080 4 8 skylake, avx512, 5122, singleprecision, pi, rtx2080 pi_sestan Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_sestan partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 8358 64 1991 icelake, avx512, 8358, nogpu, bigtmp, pi pi_sigworth Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_sigworth partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 163 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti pi_sindelar Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_sindelar partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6240 36 163 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 1 E5-2637_v4 8 101 gtx1080ti 4 11 broadwell, E5-2637_v4, singleprecision, pi, gtx1080ti pi_tomography Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_tomography partition are subject to the following limits: Limit Value Maximum job time limit 4-00:00:00 Maximum CPUs per user 32 Maximum GPUs per user 24 Maximum running jobs per user 2 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6242 32 981 rtx8000 2 48 cascadelake, avx512, 6242, doubleprecision, pi, bigtmp, rtx8000 8 5222 8 163 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, pi, bigtmp, rtx5000 pi_townsend Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_townsend partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 180 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi pi_tsang Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_tsang partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 4 8358 64 983 icelake, avx512, 8358, nogpu, bigtmp, pi pi_ya-chi_ho Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_ya-chi_ho partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 1 8268 48 352 cascadelake, avx512, 8268, nogpu, bigtmp, pi pi_yong_xiong Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the pi_yong_xiong partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 4 6326 32 479 a5000 4 24 icelake, avx512, 6326, doubleprecision, a5000, pi 1 6226r 32 163 rtx3090 4 24 cascadelake, avx512, 6226r, doubleprecision, pi, rtx3090 pi_zhao Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the pi_zhao partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 163 cascadelake, avx512, 6240, nogpu, bigtmp, standard, pi","title":"Private Partitions"},{"location":"clusters/mccleary/#ycga-partitions","text":"The following partitions are intended for projects related to the Yale Center for Genome Analysis . Please do not use these partitions for other proejcts. Access is granted on a group basis. If you need access to these partitions, please contact us to get approved and added. YCGA Partitions (click to expand) ycga Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the ycga partition are subject to the following limits: Limit Value Maximum job time limit 2-00:00:00 Maximum CPUs per group 512 Maximum memory per group 3934G Maximum CPUs per user 256 Maximum memory per user 1916G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 40 8362 64 479 icelake, avx512, 8362, nogpu, standard, pi ycga_admins Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 8362 64 479 icelake, avx512, 8362, nogpu, standard, pi ycga_bigmem Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the ycga_bigmem partition are subject to the following limits: Limit Value Maximum job time limit 4-00:00:00 Maximum CPUs per user 64 Maximum memory per user 1991G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 4 6346 32 1991 icelake, avx512, 6346, nogpu, pi ycga_long Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the ycga_long partition are subject to the following limits: Limit Value Maximum job time limit 14-00:00:00 Maximum CPUs per group 64 Maximum memory per group 479G Maximum CPUs per user 32 Maximum memory per user 239G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 6 8362 64 479 icelake, avx512, 8362, nogpu, standard, pi","title":"YCGA Partitions"},{"location":"clusters/mccleary/#public-datasets","text":"We host datasets of general interest in a loosely organized directory tree in /gpfs/gibbs/data : \u251c\u2500\u2500 alphafold-2.3 \u251c\u2500\u2500 alphafold-2.2 (deprecated) \u251c\u2500\u2500 alphafold-2.0 (deprecated) \u251c\u2500\u2500 annovar \u2502 \u2514\u2500\u2500 humandb \u251c\u2500\u2500 cryoem \u251c\u2500\u2500 db \u2502 \u251c\u2500\u2500 annovar \u2502 \u251c\u2500\u2500 blast \u2502 \u251c\u2500\u2500 busco \u2502 \u2514\u2500\u2500 Pfam \u2514\u2500\u2500 genomes \u251c\u2500\u2500 1000Genomes \u251c\u2500\u2500 10xgenomics \u251c\u2500\u2500 Aedes_aegypti \u251c\u2500\u2500 Bos_taurus \u251c\u2500\u2500 Chelonoidis_nigra \u251c\u2500\u2500 Danio_rerio \u251c\u2500\u2500 Drosophila_melanogaster \u251c\u2500\u2500 Gallus_gallus \u251c\u2500\u2500 hisat2 \u251c\u2500\u2500 Homo_sapiens \u251c\u2500\u2500 Macaca_mulatta \u251c\u2500\u2500 Mus_musculus \u251c\u2500\u2500 Monodelphis_domestica \u251c\u2500\u2500 PhiX \u2514\u2500\u2500 Saccharomyces_cerevisiae \u2514\u2500\u2500 tmp \u2514\u2500\u2500 hisat2 \u2514\u2500\u2500 mouse If you would like us to host a dataset or questions about what is currently available, please contact us .","title":"Public Datasets"},{"location":"clusters/mccleary/#ycga-data","text":"Data associated with YCGA projects and sequenceers are located on the YCGA storage system, accessible at /gpfs/ycga . For more information on accessing this data as well as sequencing data retention polices, see the YCGA Data documentation .","title":"YCGA Data"},{"location":"clusters/mccleary/#storage","text":"McCleary has access to a number of GPFS filesystems. /vast/palmer is McCleary's primary filesystem where Home and Scratch60 directories are located. Every group on McCleary also has access to a Project allocation on the Gibbs filesytem on /gpfs/gibbs . For more details on the different storage spaces, see our Cluster Storage documentation. You can check your current storage usage & limits by running the getquota command. Your ~/project and ~/palmer_scratch directories are shortcuts. Get a list of the absolute paths to your directories with the mydirectories command. If you want to share data in your Project or Scratch directory, see the permissions page. For information on data recovery, see the Backups and Snapshots documentation. Warning Files stored in palmer_scratch are purged if they are older than 60 days. You will receive an email alert one week before they are deleted. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage. Partition Root Directory Storage File Count Backups Snapshots home /vast/palmer/home.mccleary 125GiB/user 500,000 Yes >=2 days project /gpfs/gibbs/project 1TiB/group, increase to 4TiB on request 5,000,000 No >=2 days scratch /vast/palmer/scratch 10TiB/group 15,000,000 No No","title":"Storage"},{"location":"clusters/milgram-workstations/","text":"Milgram Workstations Host Name Lab Location cannon1.milgram.hpc.yale.internal Cannon SSS Hall cannon2.milgram.hpc.yale.internal Cannon SSS Hall casey1.milgram.hpc.yale.internal Casey SSS Hall chang1.milgram.hpc.yale.internal Chang Dunham Lab cl1.milgram.hpc.yale.internal Chun SSS Hall cl2.milgram.hpc.yale.internal Chun SSS Hall cl3.milgram.hpc.yale.internal Chun SSS Hall crockett1.milgram.hpc.yale.internal Crockett Dunham Lab gee1.milgram.hpc.yale.internal Gee Kirtland Hall gee2.milgram.hpc.yale.internal Gee Kirtland Hall hl1.milgram.hpc.yale.internal Holmes SSS Hall hl2.milgram.hpc.yale.internal Holmes SSS Hall joormann1.milgram.hpc.yale.internal Joorman Kirtland Hall","title":"Milgram Workstations"},{"location":"clusters/milgram-workstations/#milgram-workstations","text":"Host Name Lab Location cannon1.milgram.hpc.yale.internal Cannon SSS Hall cannon2.milgram.hpc.yale.internal Cannon SSS Hall casey1.milgram.hpc.yale.internal Casey SSS Hall chang1.milgram.hpc.yale.internal Chang Dunham Lab cl1.milgram.hpc.yale.internal Chun SSS Hall cl2.milgram.hpc.yale.internal Chun SSS Hall cl3.milgram.hpc.yale.internal Chun SSS Hall crockett1.milgram.hpc.yale.internal Crockett Dunham Lab gee1.milgram.hpc.yale.internal Gee Kirtland Hall gee2.milgram.hpc.yale.internal Gee Kirtland Hall hl1.milgram.hpc.yale.internal Holmes SSS Hall hl2.milgram.hpc.yale.internal Holmes SSS Hall joormann1.milgram.hpc.yale.internal Joorman Kirtland Hall","title":"Milgram Workstations"},{"location":"clusters/milgram/","text":"Milgram Milgram is a HIPAA aligned cluster intended for use on projects that may involve sensitive data. This applies to both storage and computation. If you have any questions about this policy, please contact us . Milgram is named for Dr. Stanley Milgram, a psychologist who researched the behavioral motivations behind social awareness in individuals and obedience to authority figures. He conducted several famous experiments during his professorship at Yale University including the lost-letter experiment, the small-world experiment, and the Milgram experiment. Milgram Usage Policies Users wishing to use Milgram must agree to the following: All Milgram users must have fulfilled and be current with Yale's HIPAA training requirement. Since Milgram's resources are limited, we ask that you only use Milgram for work on and storage of sensitive data, and that you do your other high performance computing on our other clusters. Multifactor Authentication on Milgram Multifactor authentication via Duo is required for all users on Milgram. For most usage this additional step is minimally invasive and makes our clusters much more secure. However, for users who use graphical transfer tools such as Cyberduck, please see our MFA transfer documentation . Access the Cluster Once you have an account , the cluster can be accessed via ssh or through the Open OnDemand web portal . Info Connections to Milgram can only be made from the Yale VPN ( access.yale.edu )--even if you are already on campus (YaleSecure or ethernet). See our VPN page for setup instructions. System Status and Monitoring For system status messages and the schedule for upcoming maintenance, please see the system status page . For a current node-level view of job activity, see the cluster monitor page (VPN only) . Partitions and Hardware Milgram is made up of several kinds of compute nodes. We group them into (sometimes overlapping) Slurm partitions meant to serve different purposes. By combining the --partition and --constraint Slurm options you can more finely control what nodes your jobs can run on. Job Submission Rate Limits Job submissions are limited to 200 jobs per hour . See the Rate Limits section in the Common Job Failures page for more info. Public Partitions See each tab below for more information about the available common use partitions. day Use the day partition for most batch jobs. This is the default if you don't specify one with --partition . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the day partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 324 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 14 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp interactive Use the interactive partition to jobs with which you need ongoing interaction. For example, exploratory analyses or debugging compilation. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the interactive partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 4 Maximum memory per user 32G Maximum running jobs per user 1 Maximum submitted jobs per user 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp week Use the week partition for jobs that need a longer runtime than day allows. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the week partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Maximum CPUs per user 72 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 4 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp gpu Use the gpu partition for jobs that make use of GPUs. You must request GPUs explicitly with the --gpus option in order to use them. For example, --gpus=gtx1080ti:2 would request 2 GeForce GTX 1080Ti GPUs per node. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the gpu partition are subject to the following limits: Limit Value Maximum job time limit 2-00:00:00 Maximum GPUs per user 4 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 scavenge Use the scavenge partition to run preemptable jobs on more resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the scavenge partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6326 32 497 a40 4 48 icelake, a40, avx512, pi, 6326, singleprecision, bigtmp 18 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp 2 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 10 6240 36 372 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 47 E5-2660_v4 28 247 broadwell, E5-2660_v4, nogpu, standard, pi, oldest Private Partitions With few exceptions, jobs submitted to private partitions are not considered when calculating your group's Fairshare . Your group can purchase additional hardware for private use, which we will make available as a pi_groupname partition. These nodes are purchased by you, but supported and administered by us. After vendor support expires, we retire compute nodes. Compute nodes can range from $10K to upwards of $50K depending on your requirements. If you are interested in purchasing nodes for your group, please contact us . PI Partitions (click to expand) pi_shung Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6326 32 497 a40 4 48 icelake, a40, avx512, pi, 6326, singleprecision, bigtmp psych_day Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the psych_day partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per group 500 Maximum memory per group 2500G Maximum CPUs per user 350 Maximum memory per user 1750G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 43 E5-2660_v4 28 247 broadwell, E5-2660_v4, nogpu, standard, pi, oldest psych_gpu Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the psych_gpu partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Maximum GPUs per user 20 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 10 6240 36 372 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti psych_interactive Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the psych_interactive partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 4 Maximum memory per user 32G Maximum running jobs per user 1 Maximum submitted jobs per user 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 E5-2660_v4 28 247 broadwell, E5-2660_v4, nogpu, standard, pi, oldest psych_scavenge Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the psych_scavenge partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 10 6240 36 372 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 47 E5-2660_v4 28 247 broadwell, E5-2660_v4, nogpu, standard, pi, oldest psych_week Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the psych_week partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Maximum CPUs per group 500 Maximum memory per group 2500G Maximum CPUs per user 350 Maximum memory per user 1750G Maximum CPUs in use 448 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 43 E5-2660_v4 28 247 broadwell, E5-2660_v4, nogpu, standard, pi, oldest Storage /gpfs/milgram is Milgram's primary filesystem where home, project, and scratch60 directories are located. For more details on the different storage spaces, see our Cluster Storage documentation. You can check your current storage usage & limits by running the getquota command. Note that the per-user usage breakdown only update once daily. For information on data recovery, see the Backups and Snapshots documentation. Warning Files stored in scratch60 are purged if they are older than 60 days. You will receive an email alert one week before they are deleted. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage. Partition Root Directory Storage File Count Backups Snapshots home /gpfs/milgram/home 125GiB/user 500,000 Yes >=2 days project /gpfs/milgram/project 1TiB/group, increase to 4TiB on request 5,000,000 Yes >=2 days scratch60 /gpfs/milgram/scratch60 20TiB/group 15,000,000 No No","title":"Milgram"},{"location":"clusters/milgram/#milgram","text":"Milgram is a HIPAA aligned cluster intended for use on projects that may involve sensitive data. This applies to both storage and computation. If you have any questions about this policy, please contact us . Milgram is named for Dr. Stanley Milgram, a psychologist who researched the behavioral motivations behind social awareness in individuals and obedience to authority figures. He conducted several famous experiments during his professorship at Yale University including the lost-letter experiment, the small-world experiment, and the Milgram experiment.","title":"Milgram"},{"location":"clusters/milgram/#milgram-usage-policies","text":"Users wishing to use Milgram must agree to the following: All Milgram users must have fulfilled and be current with Yale's HIPAA training requirement. Since Milgram's resources are limited, we ask that you only use Milgram for work on and storage of sensitive data, and that you do your other high performance computing on our other clusters. Multifactor Authentication on Milgram Multifactor authentication via Duo is required for all users on Milgram. For most usage this additional step is minimally invasive and makes our clusters much more secure. However, for users who use graphical transfer tools such as Cyberduck, please see our MFA transfer documentation .","title":"Milgram Usage Policies"},{"location":"clusters/milgram/#access-the-cluster","text":"Once you have an account , the cluster can be accessed via ssh or through the Open OnDemand web portal . Info Connections to Milgram can only be made from the Yale VPN ( access.yale.edu )--even if you are already on campus (YaleSecure or ethernet). See our VPN page for setup instructions.","title":"Access the Cluster"},{"location":"clusters/milgram/#system-status-and-monitoring","text":"For system status messages and the schedule for upcoming maintenance, please see the system status page . For a current node-level view of job activity, see the cluster monitor page (VPN only) .","title":"System Status and Monitoring"},{"location":"clusters/milgram/#partitions-and-hardware","text":"Milgram is made up of several kinds of compute nodes. We group them into (sometimes overlapping) Slurm partitions meant to serve different purposes. By combining the --partition and --constraint Slurm options you can more finely control what nodes your jobs can run on. Job Submission Rate Limits Job submissions are limited to 200 jobs per hour . See the Rate Limits section in the Common Job Failures page for more info.","title":"Partitions and Hardware"},{"location":"clusters/milgram/#public-partitions","text":"See each tab below for more information about the available common use partitions. day Use the day partition for most batch jobs. This is the default if you don't specify one with --partition . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the day partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per user 324 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 14 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp interactive Use the interactive partition to jobs with which you need ongoing interaction. For example, exploratory analyses or debugging compilation. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the interactive partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 4 Maximum memory per user 32G Maximum running jobs per user 1 Maximum submitted jobs per user 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp week Use the week partition for jobs that need a longer runtime than day allows. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the week partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Maximum CPUs per user 72 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 4 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp gpu Use the gpu partition for jobs that make use of GPUs. You must request GPUs explicitly with the --gpus option in order to use them. For example, --gpus=gtx1080ti:2 would request 2 GeForce GTX 1080Ti GPUs per node. Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the gpu partition are subject to the following limits: Limit Value Maximum job time limit 2-00:00:00 Maximum GPUs per user 4 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 2 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 scavenge Use the scavenge partition to run preemptable jobs on more resources than normally allowed. For more information about scavenge, see the Scavenge documentation . Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the scavenge partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6326 32 497 a40 4 48 icelake, a40, avx512, pi, 6326, singleprecision, bigtmp 18 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, common, bigtmp 2 5222 8 181 rtx5000 4 16 cascadelake, avx512, 5222, doubleprecision, common, bigtmp, rtx5000 10 6240 36 372 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 47 E5-2660_v4 28 247 broadwell, E5-2660_v4, nogpu, standard, pi, oldest","title":"Public Partitions"},{"location":"clusters/milgram/#private-partitions","text":"With few exceptions, jobs submitted to private partitions are not considered when calculating your group's Fairshare . Your group can purchase additional hardware for private use, which we will make available as a pi_groupname partition. These nodes are purchased by you, but supported and administered by us. After vendor support expires, we retire compute nodes. Compute nodes can range from $10K to upwards of $50K depending on your requirements. If you are interested in purchasing nodes for your group, please contact us . PI Partitions (click to expand) pi_shung Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 1 6326 32 497 a40 4 48 icelake, a40, avx512, pi, 6326, singleprecision, bigtmp psych_day Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the psych_day partition are subject to the following limits: Limit Value Maximum job time limit 1-00:00:00 Maximum CPUs per group 500 Maximum memory per group 2500G Maximum CPUs per user 350 Maximum memory per user 1750G Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 43 E5-2660_v4 28 247 broadwell, E5-2660_v4, nogpu, standard, pi, oldest psych_gpu Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the psych_gpu partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Maximum GPUs per user 20 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 10 6240 36 372 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti psych_interactive Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the psych_interactive partition are subject to the following limits: Limit Value Maximum job time limit 06:00:00 Maximum CPUs per user 4 Maximum memory per user 32G Maximum running jobs per user 1 Maximum submitted jobs per user 1 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 2 E5-2660_v4 28 247 broadwell, E5-2660_v4, nogpu, standard, pi, oldest psych_scavenge Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 GPU jobs need GPUs! Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus option. Job Limits Jobs submitted to the psych_scavenge partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) GPU Type GPUs/Node vRAM/GPU (GB) Node Features 10 6240 36 372 rtx2080ti 4 11 cascadelake, avx512, 6240, singleprecision, pi, bigtmp, rtx2080ti 47 E5-2660_v4 28 247 broadwell, E5-2660_v4, nogpu, standard, pi, oldest psych_week Request Defaults Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition. --time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120 Job Limits Jobs submitted to the psych_week partition are subject to the following limits: Limit Value Maximum job time limit 7-00:00:00 Maximum CPUs per group 500 Maximum memory per group 2500G Maximum CPUs per user 350 Maximum memory per user 1750G Maximum CPUs in use 448 Available Compute Nodes Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node. Count CPU Type CPUs/Node Memory/Node (GiB) Node Features 43 E5-2660_v4 28 247 broadwell, E5-2660_v4, nogpu, standard, pi, oldest","title":"Private Partitions"},{"location":"clusters/milgram/#storage","text":"/gpfs/milgram is Milgram's primary filesystem where home, project, and scratch60 directories are located. For more details on the different storage spaces, see our Cluster Storage documentation. You can check your current storage usage & limits by running the getquota command. Note that the per-user usage breakdown only update once daily. For information on data recovery, see the Backups and Snapshots documentation. Warning Files stored in scratch60 are purged if they are older than 60 days. You will receive an email alert one week before they are deleted. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage. Partition Root Directory Storage File Count Backups Snapshots home /gpfs/milgram/home 125GiB/user 500,000 Yes >=2 days project /gpfs/milgram/project 1TiB/group, increase to 4TiB on request 5,000,000 Yes >=2 days scratch60 /gpfs/milgram/scratch60 20TiB/group 15,000,000 No No","title":"Storage"},{"location":"clusters/ruddle/","text":"Ruddle Ruddle was intended for use only on projects related to the Yale Center for Genome Analysis ; Please do not use this cluster for other projects. If you have any questions about this policy, please contact us . Ruddle was named for Frank Ruddle , a Yale geneticist who was a pioneer in genetic engineering and the study of developmental genetics. Ruddle Retirement After more than seven years in service, the Ruddle HPC cluster was retired on July 24th. Ruddle was replaced with the new HPC cluster, McCleary . For more information and updates see the McCleary announcement page .","title":"Ruddle"},{"location":"clusters/ruddle/#ruddle","text":"Ruddle was intended for use only on projects related to the Yale Center for Genome Analysis ; Please do not use this cluster for other projects. If you have any questions about this policy, please contact us . Ruddle was named for Frank Ruddle , a Yale geneticist who was a pioneer in genetic engineering and the study of developmental genetics. Ruddle Retirement After more than seven years in service, the Ruddle HPC cluster was retired on July 24th. Ruddle was replaced with the new HPC cluster, McCleary . For more information and updates see the McCleary announcement page .","title":"Ruddle"},{"location":"clusters-at-yale/","text":"Getting Started HPC Clusters Broadly speaking, a high performance computing (HPC) cluster is a collection of networked computers and data storage. We refer to individual servers in this network as nodes. Our clusters are only accessible to researchers remotely; your gateways to the cluster are the login nodes . From these nodes, you view files and dispatch jobs to other nodes across the cluster configured for computation, called compute nodes . The tool we use to manage these jobs is called a job scheduler . All compute nodes on a cluster mount a shared filesystem ; a file server or set of servers store files on a large array of disks. This allows your jobs to access and edit your data from any compute node. See our summary of the compute and storage hardware we maintain, from which you can navigate to a detailed description of each cluster. Request an Account The first step in gaining access to one of our clusters is to request an account. All users must adhere to the YCRC HPC Policies . To understand which cluster is appropriate for you and to request an account, visit the account request page . Be a Good Cluster Citizen While using HPC resources, here are some important things to remember: Do not run jobs, transfers or computation on a login node, instead submit jobs . Similarly, transfer nodes are only for data transfers. Do not run jobs or computation on the transfer nodes. Never give your password or ssh key to anyone else. Do not store any high risk data on the clusters, except Milgram . Do not run larger numbers of very short (less than a minute) jobs Use of the clusters is also governed by our official guidelines . Log in Once you have an account, go to our Log on to the Clusters page login information and configuration. If you want to access the clusters from outside Yale's network, you must use the Yale VPN. Schedule a Job On our clusters, you control your jobs using a job scheduling system called Slurm that allocates and manages compute resources for you. You can submit your jobs in one of two ways. For testing and small jobs you may want to run a job interactively . This way you can directly interact with the compute node(s) in real time. The other way, which is the preferred way for multiple jobs or long-running jobs, involves writing your job commands in a script and submitting that to the job scheduler. Please see our Slurm documentation or attend the Introduction to HPC workshop for more details. Use Software To best serve the diverse needs of all our researchers, we use software modules to make multiple versions of popular software available. Modules allow you to swap between different applications and versions of those applications with relative ease. We also provide assistance for installing less commonly used packages. See our Applications & Software documentation for more details. Transfer Your Files You will likely want to copy files between your computer and the clusters. There are a couple methods available to you, and the best for each situation usually depends on the size and number of files you would like to transfer. For most situations, uploading files through Open OnDemand's upload interface is the best option. This can be done directly through the file viewer interface by clicking the Upload button and dragging and dropping your files into the upload window. For more information on this as well as other upload methods, see our transferring data page. Introduction to HPC Tutorial To help new cluster users navigate their first interactive and batch jobs, we have an Introduction to HPC tutorial to correspond with the topics discussed in our Introduction to HPC YouTube video . Linux Our clusters run the Linux operating system, where we support the use of the Bash shell. A basically familiarity with Linux commands is required for interacting with the clusters. We periodically run an Intro to Linux Bootcamp to get you started. There are also many excellent beginner tutorials available for free online, including the following: Unix Tutorial for Beginners Interactive Command Line Bootcamp Hands on Training We offer several courses that will assist you with your work on our clusters. They range from orientation for absolute beginners to advanced topics on application-specific optimization. Please peruse our catalog of training to see what is available. Get Help If you have additional questions/comments, please contact us . Where applicable, please include the following information: Your NetID Cluster name Partition name Job ID(s) Error messages Command used to submit the job(s) Path(s) to scripts called by the submission command Path(s) to output files from your jobs","title":"Getting Started"},{"location":"clusters-at-yale/#getting-started","text":"","title":"Getting Started"},{"location":"clusters-at-yale/#hpc-clusters","text":"Broadly speaking, a high performance computing (HPC) cluster is a collection of networked computers and data storage. We refer to individual servers in this network as nodes. Our clusters are only accessible to researchers remotely; your gateways to the cluster are the login nodes . From these nodes, you view files and dispatch jobs to other nodes across the cluster configured for computation, called compute nodes . The tool we use to manage these jobs is called a job scheduler . All compute nodes on a cluster mount a shared filesystem ; a file server or set of servers store files on a large array of disks. This allows your jobs to access and edit your data from any compute node. See our summary of the compute and storage hardware we maintain, from which you can navigate to a detailed description of each cluster.","title":"HPC Clusters"},{"location":"clusters-at-yale/#request-an-account","text":"The first step in gaining access to one of our clusters is to request an account. All users must adhere to the YCRC HPC Policies . To understand which cluster is appropriate for you and to request an account, visit the account request page .","title":"Request an Account"},{"location":"clusters-at-yale/#be-a-good-cluster-citizen","text":"While using HPC resources, here are some important things to remember: Do not run jobs, transfers or computation on a login node, instead submit jobs . Similarly, transfer nodes are only for data transfers. Do not run jobs or computation on the transfer nodes. Never give your password or ssh key to anyone else. Do not store any high risk data on the clusters, except Milgram . Do not run larger numbers of very short (less than a minute) jobs Use of the clusters is also governed by our official guidelines .","title":"Be a Good Cluster Citizen"},{"location":"clusters-at-yale/#log-in","text":"Once you have an account, go to our Log on to the Clusters page login information and configuration. If you want to access the clusters from outside Yale's network, you must use the Yale VPN.","title":"Log in"},{"location":"clusters-at-yale/#schedule-a-job","text":"On our clusters, you control your jobs using a job scheduling system called Slurm that allocates and manages compute resources for you. You can submit your jobs in one of two ways. For testing and small jobs you may want to run a job interactively . This way you can directly interact with the compute node(s) in real time. The other way, which is the preferred way for multiple jobs or long-running jobs, involves writing your job commands in a script and submitting that to the job scheduler. Please see our Slurm documentation or attend the Introduction to HPC workshop for more details.","title":"Schedule a Job"},{"location":"clusters-at-yale/#use-software","text":"To best serve the diverse needs of all our researchers, we use software modules to make multiple versions of popular software available. Modules allow you to swap between different applications and versions of those applications with relative ease. We also provide assistance for installing less commonly used packages. See our Applications & Software documentation for more details.","title":"Use Software"},{"location":"clusters-at-yale/#transfer-your-files","text":"You will likely want to copy files between your computer and the clusters. There are a couple methods available to you, and the best for each situation usually depends on the size and number of files you would like to transfer. For most situations, uploading files through Open OnDemand's upload interface is the best option. This can be done directly through the file viewer interface by clicking the Upload button and dragging and dropping your files into the upload window. For more information on this as well as other upload methods, see our transferring data page.","title":"Transfer Your Files"},{"location":"clusters-at-yale/#introduction-to-hpc-tutorial","text":"To help new cluster users navigate their first interactive and batch jobs, we have an Introduction to HPC tutorial to correspond with the topics discussed in our Introduction to HPC YouTube video .","title":"Introduction to HPC Tutorial"},{"location":"clusters-at-yale/#linux","text":"Our clusters run the Linux operating system, where we support the use of the Bash shell. A basically familiarity with Linux commands is required for interacting with the clusters. We periodically run an Intro to Linux Bootcamp to get you started. There are also many excellent beginner tutorials available for free online, including the following: Unix Tutorial for Beginners Interactive Command Line Bootcamp","title":"Linux"},{"location":"clusters-at-yale/#hands-on-training","text":"We offer several courses that will assist you with your work on our clusters. They range from orientation for absolute beginners to advanced topics on application-specific optimization. Please peruse our catalog of training to see what is available.","title":"Hands on Training"},{"location":"clusters-at-yale/#get-help","text":"If you have additional questions/comments, please contact us . Where applicable, please include the following information: Your NetID Cluster name Partition name Job ID(s) Error messages Command used to submit the job(s) Path(s) to scripts called by the submission command Path(s) to output files from your jobs","title":"Get Help"},{"location":"clusters-at-yale/glossary/","text":"Glossary To help clarify the way we refer to certain terms in our user documentation, here is a brief list of some of the words that regularly come up in our documents. Please reach out to us at hpc@yale.edu if there are any other words that need to be added. Account - used to authenticate and grant permission to access resources Account (Slurm) - an accounting mechanism to keep track of a group's computing usage Activate - making something operational Array - a data structure across a series of memory locations consisting of elements organized in an index Array (job) - a series of jobs that all request the same resources and run the same batch script Array Task ID - a unique sequential number with an appended number that refers to an individual task within the set of submitted jobs Channel - Community-led collections of packages created by a group or lab installed with conda to allow for a homogenous environment across systems CLI - Command Line Interface processes commands to a computer program in the form of lines of text in a window Cluster - a set of computers, called nodes) networked together so nodes can perform the tasks facilitated by a scheduling software Command - a specific order from a computer to execute a service with either an application or the operating system Compute Node - the nodes that work runs on to perform computational work Container - A stack of software, libraries and operating system that is independent of the host computer and can be accessed on other computers Container Image - Self-contained read-only files used to run applications CPU - Central Processing Units are the components of a system that perform basic operations and exchange data with the system\u2019s memory (also known as a processor) Data - items of information collected together for reference or analysis Database - a collection of structured data held within a computer Deactivate - making something de-operational Environment - a collection of hardware, software, data storage and networks that work together in facilitating the processing and exchange of information Extension - Suffix at the end of a filename to indicate the file type Fileset - a section of a storage device that is given a designated purpose Filesystem - a process that manages how and where data is stored Flag - (see Options) GPU - Graphics Processing Units are specialized circuits designed to rapidly manipulate memory and create images in a frame buffer for a displayed output GridFTP - an extension of the Fire Transfer Protocol for grid computing that allows users to transfer and save data on a different account such as Google Drive or other off network memory Group - a collection of users who can all be given the same permissions on a system GUI - Graphical User Interface allows users to interact with devices and applications through a visual window that can commonly display icons and predetermined fields Hardware - the physical parts of a computer Host - (ie. Host Computer) A device connected to a computer network that offers resources, services and applications to users on the network Image - (See Container Image) Index - a method of sorting data by creating keywords or a listing of data Interface - a boundary across which two or more computer system components can exchange information Job - a unit of work given to an operating system by a scheduler Job Array - a way to submit multiple similar jobs by associating each subjob with an index value based on an array task id Key - a variable value applied using an algorithm to a block of unencrypted text to produce encrypted text Load - transfer a program or data into memory or into the CPU Login Node - a node that users log in on to access the cluster Memory - (see RAM) Metadata - A set of data that describes and gives basic information about other data Module - a number of distinct but interrelated units that build up or into a program MPI - Message Passing Interface is a standardized and portable message-passing standard designed to function on parallel computing architectures Multiprocessing - the ability to operate more than one task simultaneously on the same program across two or more processors in a computer Node - a server in the cluster Option - a single letter or full word that modifies the behavior of a command in a predetermined way (also known as a flag or switch) Package - a collection of hardware and software needed to create a working system Parallel - (ex. Computing/Programming) Architecture in which several processes are carried out simultaneously across smaller, independent parts Partition - a section of a storage device that is given a designated purpose Partition (Slurm) - a collection of compute nodes available via the scheduler Path - A string of characters used to identify locations throughout a directory structure Pane - (Associate with window) A subdivision within a window where an independent terminal can run simultaneously alongside another terminal Processor - (see CPU) Queue - a sequence of objects arranged according to priority waiting to be processed RAM - Random Access Memory, also known as \"Memory\" can be read and changed in any order and is typically used to to store working data Reproducibility - the ability to execute the same results across multiple systems by different individuals using the same data Scheduler - the software used to assign resources to a job for tasks Scheduling - the act of assigning resources to a task through a software product Session - a temporary information exchange between two or more devices SSH - secure shell is a cryptographic network protocol for operating network services securely over an unsecured network Software - a collection of data and instructions that tell a computer how to operate Switch - (see Options) System - a set of integrated hardware and software that input, output, process, and store data and information Task ID - a unique sequential number used to refer to a task Terminal - Referring to a terminal program, a text-based interface for typing commands Toolchain - a set of tools performing individual actions used in delivering an operation Unload - remove a program or data from memory or out of the CPU User - a person interacting and utilizing a computing service Variable - assigned and referenced data values that can be called within a program and changed depending on how the program runs Window - (Associate with pane) the whole screen being displayed, containing subdivisions, or panes, that can run independent terminals alongside each other","title":"Glossary"},{"location":"clusters-at-yale/glossary/#glossary","text":"To help clarify the way we refer to certain terms in our user documentation, here is a brief list of some of the words that regularly come up in our documents. Please reach out to us at hpc@yale.edu if there are any other words that need to be added. Account - used to authenticate and grant permission to access resources Account (Slurm) - an accounting mechanism to keep track of a group's computing usage Activate - making something operational Array - a data structure across a series of memory locations consisting of elements organized in an index Array (job) - a series of jobs that all request the same resources and run the same batch script Array Task ID - a unique sequential number with an appended number that refers to an individual task within the set of submitted jobs Channel - Community-led collections of packages created by a group or lab installed with conda to allow for a homogenous environment across systems CLI - Command Line Interface processes commands to a computer program in the form of lines of text in a window Cluster - a set of computers, called nodes) networked together so nodes can perform the tasks facilitated by a scheduling software Command - a specific order from a computer to execute a service with either an application or the operating system Compute Node - the nodes that work runs on to perform computational work Container - A stack of software, libraries and operating system that is independent of the host computer and can be accessed on other computers Container Image - Self-contained read-only files used to run applications CPU - Central Processing Units are the components of a system that perform basic operations and exchange data with the system\u2019s memory (also known as a processor) Data - items of information collected together for reference or analysis Database - a collection of structured data held within a computer Deactivate - making something de-operational Environment - a collection of hardware, software, data storage and networks that work together in facilitating the processing and exchange of information Extension - Suffix at the end of a filename to indicate the file type Fileset - a section of a storage device that is given a designated purpose Filesystem - a process that manages how and where data is stored Flag - (see Options) GPU - Graphics Processing Units are specialized circuits designed to rapidly manipulate memory and create images in a frame buffer for a displayed output GridFTP - an extension of the Fire Transfer Protocol for grid computing that allows users to transfer and save data on a different account such as Google Drive or other off network memory Group - a collection of users who can all be given the same permissions on a system GUI - Graphical User Interface allows users to interact with devices and applications through a visual window that can commonly display icons and predetermined fields Hardware - the physical parts of a computer Host - (ie. Host Computer) A device connected to a computer network that offers resources, services and applications to users on the network Image - (See Container Image) Index - a method of sorting data by creating keywords or a listing of data Interface - a boundary across which two or more computer system components can exchange information Job - a unit of work given to an operating system by a scheduler Job Array - a way to submit multiple similar jobs by associating each subjob with an index value based on an array task id Key - a variable value applied using an algorithm to a block of unencrypted text to produce encrypted text Load - transfer a program or data into memory or into the CPU Login Node - a node that users log in on to access the cluster Memory - (see RAM) Metadata - A set of data that describes and gives basic information about other data Module - a number of distinct but interrelated units that build up or into a program MPI - Message Passing Interface is a standardized and portable message-passing standard designed to function on parallel computing architectures Multiprocessing - the ability to operate more than one task simultaneously on the same program across two or more processors in a computer Node - a server in the cluster Option - a single letter or full word that modifies the behavior of a command in a predetermined way (also known as a flag or switch) Package - a collection of hardware and software needed to create a working system Parallel - (ex. Computing/Programming) Architecture in which several processes are carried out simultaneously across smaller, independent parts Partition - a section of a storage device that is given a designated purpose Partition (Slurm) - a collection of compute nodes available via the scheduler Path - A string of characters used to identify locations throughout a directory structure Pane - (Associate with window) A subdivision within a window where an independent terminal can run simultaneously alongside another terminal Processor - (see CPU) Queue - a sequence of objects arranged according to priority waiting to be processed RAM - Random Access Memory, also known as \"Memory\" can be read and changed in any order and is typically used to to store working data Reproducibility - the ability to execute the same results across multiple systems by different individuals using the same data Scheduler - the software used to assign resources to a job for tasks Scheduling - the act of assigning resources to a task through a software product Session - a temporary information exchange between two or more devices SSH - secure shell is a cryptographic network protocol for operating network services securely over an unsecured network Software - a collection of data and instructions that tell a computer how to operate Switch - (see Options) System - a set of integrated hardware and software that input, output, process, and store data and information Task ID - a unique sequential number used to refer to a task Terminal - Referring to a terminal program, a text-based interface for typing commands Toolchain - a set of tools performing individual actions used in delivering an operation Unload - remove a program or data from memory or out of the CPU User - a person interacting and utilizing a computing service Variable - assigned and referenced data values that can be called within a program and changed depending on how the program runs Window - (Associate with pane) the whole screen being displayed, containing subdivisions, or panes, that can run independent terminals alongside each other","title":"Glossary"},{"location":"clusters-at-yale/help-requests/","text":"Help Requests See our Get Help section for ways to get assistance, from email support to setting up 1-on-1 appointments with our staff. When requesting assistance provide the information described below (where applicable), so we can most effectively assist you. Before requesting assistance, we encourage you to take a look at the relevant documentation on this site. If you are new to the cluster, please watch our Intro to HPC tutorial available on the YCRC YouTube Channel as it covers many common usages of the systems. Troubleshoot Login If you are having trouble logging in to the cluster, please see our Troubleshoot Login guide. Information to Provide with Help Requests Whenever requesting assistance with HPC related issues, please provide the YCRC staff with the following information (where applicable) so we can investigate the problem you are encountering. To assist with providing this information, we have included instructions below on retreiving the information if you are working in the command line interface. Your NetID name of the cluster you are working on (e.g. Grace, Milgram, Ruddle or McCleary) instructions on how to repeat your issue. Please include the following: which directory are you working in or where you submitted your job Run the command pwd when you are in the directory where you encountered the issue the software modules you have loaded Run module list when you encounter the issue the commands you ran that resulted in the error or issue the name of the submission script your submitted to the scheduler with sbatch (if reporting an issue with a batch job) the error message you received, and, if applicable, the path to the output file containing the error message if you are using the default Slurm output options, this will look slurm-.out certain software may output additional information to other log files and, if applicable, include the paths to those files as well job ids for your Slurm jobs you can get the job ids for recently run jobs by running the command sacct identify the job(s) that contained the error and provide the job id(s) If possible, please paste the output into the email or include in a text file as an attachment. Screenshots or pictures are very hard for us to work with. We look forwarding to assisting you!","title":"Help Requests"},{"location":"clusters-at-yale/help-requests/#help-requests","text":"See our Get Help section for ways to get assistance, from email support to setting up 1-on-1 appointments with our staff. When requesting assistance provide the information described below (where applicable), so we can most effectively assist you. Before requesting assistance, we encourage you to take a look at the relevant documentation on this site. If you are new to the cluster, please watch our Intro to HPC tutorial available on the YCRC YouTube Channel as it covers many common usages of the systems.","title":"Help Requests"},{"location":"clusters-at-yale/help-requests/#troubleshoot-login","text":"If you are having trouble logging in to the cluster, please see our Troubleshoot Login guide.","title":"Troubleshoot Login"},{"location":"clusters-at-yale/help-requests/#information-to-provide-with-help-requests","text":"Whenever requesting assistance with HPC related issues, please provide the YCRC staff with the following information (where applicable) so we can investigate the problem you are encountering. To assist with providing this information, we have included instructions below on retreiving the information if you are working in the command line interface. Your NetID name of the cluster you are working on (e.g. Grace, Milgram, Ruddle or McCleary) instructions on how to repeat your issue. Please include the following: which directory are you working in or where you submitted your job Run the command pwd when you are in the directory where you encountered the issue the software modules you have loaded Run module list when you encounter the issue the commands you ran that resulted in the error or issue the name of the submission script your submitted to the scheduler with sbatch (if reporting an issue with a batch job) the error message you received, and, if applicable, the path to the output file containing the error message if you are using the default Slurm output options, this will look slurm-.out certain software may output additional information to other log files and, if applicable, include the paths to those files as well job ids for your Slurm jobs you can get the job ids for recently run jobs by running the command sacct identify the job(s) that contained the error and provide the job id(s) If possible, please paste the output into the email or include in a text file as an attachment. Screenshots or pictures are very hard for us to work with. We look forwarding to assisting you!","title":"Information to Provide with Help Requests"},{"location":"clusters-at-yale/troubleshoot/","text":"Troubleshoot Login Checklist If you are having trouble logging into a cluster, please use the checklist below to check for common issues: Make sure you have submitted an account request and have gotten word that we created your account for the cluster. Make sure that the cluster is online in the System Status page. Check the hostname for the cluster. See the clusters page for a list. Verify that your ssh keys are setup correctly Check for your public key in the ssh key uploader . If you recently uploaded one, it will take a few minutes appear on the cluster. If you are using macOS or Linux , make sure your private key is in ~/.ssh . If you are using Windows , make sure you have pointed MobaXterm to your private ssh key (ends in .pem) If you are asked for a passphrase when logging in, this is the ssh key passphrase you set when first creating your key pair. If you have forgotten this passphrase, you need to create a new key pair and upload a new public key. Make sure your computer is either on Yale's campus network (ethernet or YaleSecure wireless) or Yale's VPN . If you get an error like could not resolve hostname you may have lost connection to the Yale network. If you are sure you have not, make sure that you are also using the Yale DNS servers (130.132.1.9,10,11). Your home directory should only be writable by you. If you recently modified the permissions to your home directory and can't log in, contact us and we can fix the permissions for you. If you are using McCleary or Milgram , we require Duo MFA for every login. If following our MFA Troubleshooting steps doesn't work, contact the ITS Help Desk . If none of the above solve your issue, please contact us with your netid and the cluster you are attempting to connect to. Common SSH Errors Permission denied (publickey) This message means that the clusters don't (yet) have they key you are using to authenticate. Make sure you have an account on the cluster you're connecting, that you have created an ssh key pair , and uploaded the public key . If you recently uploaded one, it will take a few minutes appear on the cluster. REMOTE HOST IDENTIFICATION HAS CHANGED! If you are seeing the following error: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! .... Offending key in /home/user/.ssh/known_hosts:34 ... This usually means that the keys that identify the cluster login nodes have changed. This can be the result of system upgrades on the cluster (see Grace August 2023 Maintenance ). It could also mean someone is trying to intercept your ssh session. Please contact us if you receive this error outside of known system upgrades. If the host keys have indeed changed on the server you are connecting to, you can edit ~/.ssh/known_hosts and remove the offending line. In the example above, you would need to delete line 34 in ~/.ssh/known_hosts before you re-connect.","title":"Troubleshoot Login"},{"location":"clusters-at-yale/troubleshoot/#troubleshoot-login","text":"","title":"Troubleshoot Login"},{"location":"clusters-at-yale/troubleshoot/#checklist","text":"If you are having trouble logging into a cluster, please use the checklist below to check for common issues: Make sure you have submitted an account request and have gotten word that we created your account for the cluster. Make sure that the cluster is online in the System Status page. Check the hostname for the cluster. See the clusters page for a list. Verify that your ssh keys are setup correctly Check for your public key in the ssh key uploader . If you recently uploaded one, it will take a few minutes appear on the cluster. If you are using macOS or Linux , make sure your private key is in ~/.ssh . If you are using Windows , make sure you have pointed MobaXterm to your private ssh key (ends in .pem) If you are asked for a passphrase when logging in, this is the ssh key passphrase you set when first creating your key pair. If you have forgotten this passphrase, you need to create a new key pair and upload a new public key. Make sure your computer is either on Yale's campus network (ethernet or YaleSecure wireless) or Yale's VPN . If you get an error like could not resolve hostname you may have lost connection to the Yale network. If you are sure you have not, make sure that you are also using the Yale DNS servers (130.132.1.9,10,11). Your home directory should only be writable by you. If you recently modified the permissions to your home directory and can't log in, contact us and we can fix the permissions for you. If you are using McCleary or Milgram , we require Duo MFA for every login. If following our MFA Troubleshooting steps doesn't work, contact the ITS Help Desk . If none of the above solve your issue, please contact us with your netid and the cluster you are attempting to connect to.","title":"Checklist"},{"location":"clusters-at-yale/troubleshoot/#common-ssh-errors","text":"","title":"Common SSH Errors"},{"location":"clusters-at-yale/troubleshoot/#permission-denied-publickey","text":"This message means that the clusters don't (yet) have they key you are using to authenticate. Make sure you have an account on the cluster you're connecting, that you have created an ssh key pair , and uploaded the public key . If you recently uploaded one, it will take a few minutes appear on the cluster.","title":"Permission denied (publickey)"},{"location":"clusters-at-yale/troubleshoot/#remote-host-identification-has-changed","text":"If you are seeing the following error: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! .... Offending key in /home/user/.ssh/known_hosts:34 ... This usually means that the keys that identify the cluster login nodes have changed. This can be the result of system upgrades on the cluster (see Grace August 2023 Maintenance ). It could also mean someone is trying to intercept your ssh session. Please contact us if you receive this error outside of known system upgrades. If the host keys have indeed changed on the server you are connecting to, you can edit ~/.ssh/known_hosts and remove the offending line. In the example above, you would need to delete line 34 in ~/.ssh/known_hosts before you re-connect.","title":"REMOTE HOST IDENTIFICATION HAS CHANGED!"},{"location":"clusters-at-yale/access/","text":"Log on to the Clusters To log on the cluster, you must first request an account (if you do not already have one). When using the clusters, please review and abide by our HPC usage policies and best practices . Off Campus Access You must be on the campus network to access the clusters. For off-campus access you need to use the Yale VPN . Web Portal - Open OnDemand For most users, we recommend using the web portal, Open OnDemand, to access the clusters. For hostnames and more instructions see our Open OnDemand documentation. SSH Connection For more advanced use cases that are not well supported by the Web Portal (Open OnDemand), you can connect to the clusters over the more traditional SSH connection .","title":"Log on to the Clusters"},{"location":"clusters-at-yale/access/#log-on-to-the-clusters","text":"To log on the cluster, you must first request an account (if you do not already have one). When using the clusters, please review and abide by our HPC usage policies and best practices . Off Campus Access You must be on the campus network to access the clusters. For off-campus access you need to use the Yale VPN .","title":"Log on to the Clusters"},{"location":"clusters-at-yale/access/#web-portal-open-ondemand","text":"For most users, we recommend using the web portal, Open OnDemand, to access the clusters. For hostnames and more instructions see our Open OnDemand documentation.","title":"Web Portal - Open OnDemand"},{"location":"clusters-at-yale/access/#ssh-connection","text":"For more advanced use cases that are not well supported by the Web Portal (Open OnDemand), you can connect to the clusters over the more traditional SSH connection .","title":"SSH Connection"},{"location":"clusters-at-yale/access/accounts/","text":"Accounts & Best Practices The YCRC HPC Policies can found here . All users are required to abide by the described policies. HPC Policies Do not run jobs, transfers or computation on a login node, instead submit jobs . Similarly, transfer nodes are only for data transfers. Do not run jobs or computation on the transfer nodes. Never give your password or ssh key to anyone else. Do not store any high risk data on the clusters, except Milgram . Do not run large numbers of very short (less than a minute) jobs. Terminate interactive or Open OnDemand session when no longer in use. Idle sessions may be canceled without warning. Avoid workflows that generate numerous (thousands) of files as these put great stress on the shared filesystem. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage. Each YCRC cluster undergoes regular scheduled maintenance twice a year, see our maintenance schedule for more details. Group Allocations A research group may request an allocation on one of Yale's HPC clusters . Each group is granted access to the common compute resources and a limited cluster storage allocation . Request an Account You may request an account on a cluster using the account request form . User accounts are personal to individual users and may not be shared. Under no circumstances may any user make use of another user\u2019s account. Inactive Accounts and Account Deletion For security and communication purposes, you must have a valid email address associated with your account. Login privileges will be disable on a regular basis for any accounts without a valid email address. Therefore, if you are leaving Yale, but will continue to use the cluster on a \"Sponsored netid\" , please contact us to update the email address associated with your account as soon as possible. If you find your login has been disabled, please contact us to provide a valid email address to have your login reinstated. Additionally, an annual account audit is performed on November 1st and any accounts associated with an inactive netids (regular and Sponsored netids) will be deactivated at that time. Note that Sponsored netids need to be renewed annually through the appropriate channels. When an account is deactivated, logins and scheduler access are disabled, the home directory is archived for 5 years and all project data owned by the account is reassigned to the group's PI. The group's PI will receive a report once a year in November with a list of deactivated group members. Every group must have a PI with a valid affiliation with Yale. If your PI has left Yale, you may be asked to identify a new faculty sponsor for your account in order to continue accessing the cluster.","title":"Accounts & Best Practices"},{"location":"clusters-at-yale/access/accounts/#accounts-best-practices","text":"The YCRC HPC Policies can found here . All users are required to abide by the described policies.","title":"Accounts & Best Practices"},{"location":"clusters-at-yale/access/accounts/#hpc-policies","text":"Do not run jobs, transfers or computation on a login node, instead submit jobs . Similarly, transfer nodes are only for data transfers. Do not run jobs or computation on the transfer nodes. Never give your password or ssh key to anyone else. Do not store any high risk data on the clusters, except Milgram . Do not run large numbers of very short (less than a minute) jobs. Terminate interactive or Open OnDemand session when no longer in use. Idle sessions may be canceled without warning. Avoid workflows that generate numerous (thousands) of files as these put great stress on the shared filesystem. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage. Each YCRC cluster undergoes regular scheduled maintenance twice a year, see our maintenance schedule for more details.","title":"HPC Policies"},{"location":"clusters-at-yale/access/accounts/#group-allocations","text":"A research group may request an allocation on one of Yale's HPC clusters . Each group is granted access to the common compute resources and a limited cluster storage allocation .","title":"Group Allocations"},{"location":"clusters-at-yale/access/accounts/#request-an-account","text":"You may request an account on a cluster using the account request form . User accounts are personal to individual users and may not be shared. Under no circumstances may any user make use of another user\u2019s account.","title":"Request an Account"},{"location":"clusters-at-yale/access/accounts/#inactive-accounts-and-account-deletion","text":"For security and communication purposes, you must have a valid email address associated with your account. Login privileges will be disable on a regular basis for any accounts without a valid email address. Therefore, if you are leaving Yale, but will continue to use the cluster on a \"Sponsored netid\" , please contact us to update the email address associated with your account as soon as possible. If you find your login has been disabled, please contact us to provide a valid email address to have your login reinstated. Additionally, an annual account audit is performed on November 1st and any accounts associated with an inactive netids (regular and Sponsored netids) will be deactivated at that time. Note that Sponsored netids need to be renewed annually through the appropriate channels. When an account is deactivated, logins and scheduler access are disabled, the home directory is archived for 5 years and all project data owned by the account is reassigned to the group's PI. The group's PI will receive a report once a year in November with a list of deactivated group members. Every group must have a PI with a valid affiliation with Yale. If your PI has left Yale, you may be asked to identify a new faculty sponsor for your account in order to continue accessing the cluster.","title":"Inactive Accounts and Account Deletion"},{"location":"clusters-at-yale/access/advanced-config/","text":"Advanced SSH Configuration Example SSH config The following configuration is an example ssh client configuration file specific to our clusters. You can use it on Linux, Windows Subsystem for Linux (WSL) , and macOS. It allows you to use tab completion of the clusters, without the .ycrc.yale.edu suffixes (i.e. ssh grace or scp ~/my_file grace:my_file should work). It will also allow you to re-use and multiplex authenticated sessions. This means clusters that require Duo MFA will not force you to re-authenticate, as you use the same ssh connection to host multiple sessions. If you attempt to close your first connection with others running, it will wait until all others are closed. Save the text below to ~/.ssh/config and replace NETID with your Yale netid. Lines that begin with # will be ignored. # If you use a ssh key that is named something other than id_rsa, # you can specify your private key like this: # IdentityFile ~/.ssh/other_key_rsa # Uncomment the ForwardX11 options line to enable X11 Forwarding by default (no -Y necessary) # On a Mac you still need xquartz installed Host *.ycrc.yale.edu mccleary grace milgram User NETID #ForwardX11 yes # To re-use your connections with multi-factor authentication # Uncomment the two lines below #ControlMaster auto #ControlPath ~/.ssh/tmp/%h_%p_%r Host mccleary grace milgram HostName %h.ycrc.yale.edu Warning For multiplexing to work, the ~/.ssh/tmp directory must exist. Create it with mkdir -p ~/.ssh/tmp For more info on ssh configuration, run: man ssh_config Store Passphrase and Use SSH Agent on macOS By default, macOS won't always remember your ssh key passphrase and keep your ssh key in the agent for SSH agent forwarding. In order to not repeatedly enter your passphrase and instead store it in your keychain, enter the following command on your Mac (just once): ssh-add -K ~/.ssh/id_rsa Or whatever your private key file is named. Note If you use homebrew your default OpenSSH may have changed. To add your key(s) to the system ssh agent, use the absolute path: /usr/bin/ssh-add Then and add the following to your ~/.ssh/config file (create this file if it doesn't exist, or add these settings to the Host *.ycrc.yale.edu ... rule if it does). Host *.ycrc.yale.edu mccleary grace milgram UseKeychain yes AddKeystoAgent yes You can view a list of the keys currently in your agent with: ssh-add -L","title":"Advanced SSH Configuration"},{"location":"clusters-at-yale/access/advanced-config/#advanced-ssh-configuration","text":"","title":"Advanced SSH Configuration"},{"location":"clusters-at-yale/access/advanced-config/#example-ssh-config","text":"The following configuration is an example ssh client configuration file specific to our clusters. You can use it on Linux, Windows Subsystem for Linux (WSL) , and macOS. It allows you to use tab completion of the clusters, without the .ycrc.yale.edu suffixes (i.e. ssh grace or scp ~/my_file grace:my_file should work). It will also allow you to re-use and multiplex authenticated sessions. This means clusters that require Duo MFA will not force you to re-authenticate, as you use the same ssh connection to host multiple sessions. If you attempt to close your first connection with others running, it will wait until all others are closed. Save the text below to ~/.ssh/config and replace NETID with your Yale netid. Lines that begin with # will be ignored. # If you use a ssh key that is named something other than id_rsa, # you can specify your private key like this: # IdentityFile ~/.ssh/other_key_rsa # Uncomment the ForwardX11 options line to enable X11 Forwarding by default (no -Y necessary) # On a Mac you still need xquartz installed Host *.ycrc.yale.edu mccleary grace milgram User NETID #ForwardX11 yes # To re-use your connections with multi-factor authentication # Uncomment the two lines below #ControlMaster auto #ControlPath ~/.ssh/tmp/%h_%p_%r Host mccleary grace milgram HostName %h.ycrc.yale.edu Warning For multiplexing to work, the ~/.ssh/tmp directory must exist. Create it with mkdir -p ~/.ssh/tmp For more info on ssh configuration, run: man ssh_config","title":"Example SSH config"},{"location":"clusters-at-yale/access/advanced-config/#store-passphrase-and-use-ssh-agent-on-macos","text":"By default, macOS won't always remember your ssh key passphrase and keep your ssh key in the agent for SSH agent forwarding. In order to not repeatedly enter your passphrase and instead store it in your keychain, enter the following command on your Mac (just once): ssh-add -K ~/.ssh/id_rsa Or whatever your private key file is named. Note If you use homebrew your default OpenSSH may have changed. To add your key(s) to the system ssh agent, use the absolute path: /usr/bin/ssh-add Then and add the following to your ~/.ssh/config file (create this file if it doesn't exist, or add these settings to the Host *.ycrc.yale.edu ... rule if it does). Host *.ycrc.yale.edu mccleary grace milgram UseKeychain yes AddKeystoAgent yes You can view a list of the keys currently in your agent with: ssh-add -L","title":"Store Passphrase and Use SSH Agent on macOS"},{"location":"clusters-at-yale/access/courses/","text":"Courses The YCRC Grace and McCleary clusters can be made available for Yale courses with a suitable computational component. The YCRC hosts over a dozen courses on the clusters every semester. Warning All course allocations are temporary. All associated accounts and data will be removed one month after the last day of exams for that semester. For Instructors If you are interested in using a YCRC cluster in your Yale course, contact us at research.computing@yale.edu. If at all possible, please let us know of your interest in using a cluster at least two weeks prior to start of classes so we can plan accordingly, even if you have used the cluster in a previous semester. Course ID Your course will be give a specific courseid based on the Yale course catalog number. This courseid will be used in the course account names, web portal and, if applicable, node reservation. Course Accounts All members of a course, including the instructor and TFs will be give temporary course accounts. These accounts take the form of courseid_netid . Course accounts are district from any research accounts a course member may already have. Use this account if connecting to the cluster via ssh . All course-related accounts are subject to the same policies and expectation as standard accounts . Course Storage Courses on the YCRC clusters are typically granted a standard 1TiB project storage quota, as well as 125GiB home directory for each course member. If the course needs additional storage beyond the default 1TiB, please contact us at research.computing@yale.edu. See our cluster storage documentation for details about the different classifications of storage. Course-specific Web Portal Your course also has a course-specific web portal, based on Open OnDemand , accessible via the URL (replacing courseid with the id given to your course): courseid.ycrc.yale.edu Course members must use the course URL to log in to course accounts on Open OnDemand--the normal cluster portals are not accessible to course accounts. You will then authenticate using your standard NetID (without the courseid prefix) and password. As with all cluster access, you must be on the VPN to access the web portal if you are off campus. Node Reservations If the instructor has coordinated with the YCRC for dedicated nodes for the course, they are available via a \"reservation\". The nodes can be requested using the --reservation=courseid flag. See our Slurm documentation for more information on submitting jobs. In each of the following examples, replace courseid with the id given to your course. Jobs on course reservations are subject to the restrictions of their parent partition (e.g. 24 hour walltime limit on day or 2-day walltime limit on gpu ). If your jobs need to exceed those restrictions, please have your instructor or TF contact us. Course members are welcome to use the public partitions of the cluster. However, we request that students be respectful in their usage as not to disrupt ongoing research work. Interactive Jobs salloc -p day --reservation=courseid or if the reservation is for GPU nodes salloc -p gpu --gpus=1 --reservation=courseid Batch Jobs Add the following to your submission script: #SBATCH --reservation=courseid or if the reservation is for GPU nodes #SBATCH -p gpu --gpus=1 --reservation=courseid Web Portal In any of the app submission forms, type the courseid into the \"Reservation\" field. For standard (non-gpu) nodes, select day in the \"Partition\" field. If your node reservation contains GPU-enabled nodes, select gpu . Any course-specific apps listed under the \"Courses\" dropdown will automatically send all submitted jobs to the reservation, if one exists. Cluster Maintenance Each cluster is inaccessible twice a year for a three day regularly scheduled maintenance. The maintenance schedule is published here . Please account for the cluster unavailability when developing course schedules and (for students) completing your assignments. End of Semester Course Deletion As mentioned above, all course allocations are temporary. All associated accounts and data will be removed one month after the last day of exams for that semester. If you would like to retain any data in your course account, please download it prior to the deletion date or, if applicable, submit a request to hpc@yale.edu to transfer the data into your research account. A reminder of the removal will be sent to the instructor to see if it needs to be delayed for any incompletes (for example). Students will not received a reminder. Instructors, if you would like to retain course materials for future semesters, please copy them off the cluster or to a research account. Transfer Data to Research Account If you have a research account on the cluster, you can transfer any data you want to save from your course account to your research account. Warning Make sure there is sufficient free space in your research account storage to accomodate any data you are transferring from your course account using getquota . Login to the cluster using your course account either via Terminal or the Shell app in the OOD web portal. Grant your research account access to your course accounts directories (substitute in your courseid and netid in the example). # home directory setfacl -m u:netid:rX /home/courseid_netid # project directory on Grace and McCleary setfacl -m u:netid:rX /gpfs/gibbs/project/courseid/courseid_netid Log in as your research account. Check that you can access the above paths. Move to the transfer node with ssh transfer . If you are transferring a lot of data, open a tmux session so the transfer can continue if you disconnect from the cluster. Initiate a copy of the desired data using rsync . For example: mkdir /gpfs/gibbs/project/group/netid/my_course_data rsync -av /gpfs/gibbs/project/courseid/courseid_netid/mydata /gpfs/gibbs/project/group/netid/my_course_data","title":"Courses"},{"location":"clusters-at-yale/access/courses/#courses","text":"The YCRC Grace and McCleary clusters can be made available for Yale courses with a suitable computational component. The YCRC hosts over a dozen courses on the clusters every semester. Warning All course allocations are temporary. All associated accounts and data will be removed one month after the last day of exams for that semester. For Instructors If you are interested in using a YCRC cluster in your Yale course, contact us at research.computing@yale.edu. If at all possible, please let us know of your interest in using a cluster at least two weeks prior to start of classes so we can plan accordingly, even if you have used the cluster in a previous semester.","title":"Courses"},{"location":"clusters-at-yale/access/courses/#course-id","text":"Your course will be give a specific courseid based on the Yale course catalog number. This courseid will be used in the course account names, web portal and, if applicable, node reservation.","title":"Course ID"},{"location":"clusters-at-yale/access/courses/#course-accounts","text":"All members of a course, including the instructor and TFs will be give temporary course accounts. These accounts take the form of courseid_netid . Course accounts are district from any research accounts a course member may already have. Use this account if connecting to the cluster via ssh . All course-related accounts are subject to the same policies and expectation as standard accounts .","title":"Course Accounts"},{"location":"clusters-at-yale/access/courses/#course-storage","text":"Courses on the YCRC clusters are typically granted a standard 1TiB project storage quota, as well as 125GiB home directory for each course member. If the course needs additional storage beyond the default 1TiB, please contact us at research.computing@yale.edu. See our cluster storage documentation for details about the different classifications of storage.","title":"Course Storage"},{"location":"clusters-at-yale/access/courses/#course-specific-web-portal","text":"Your course also has a course-specific web portal, based on Open OnDemand , accessible via the URL (replacing courseid with the id given to your course): courseid.ycrc.yale.edu Course members must use the course URL to log in to course accounts on Open OnDemand--the normal cluster portals are not accessible to course accounts. You will then authenticate using your standard NetID (without the courseid prefix) and password. As with all cluster access, you must be on the VPN to access the web portal if you are off campus.","title":"Course-specific Web Portal"},{"location":"clusters-at-yale/access/courses/#node-reservations","text":"If the instructor has coordinated with the YCRC for dedicated nodes for the course, they are available via a \"reservation\". The nodes can be requested using the --reservation=courseid flag. See our Slurm documentation for more information on submitting jobs. In each of the following examples, replace courseid with the id given to your course. Jobs on course reservations are subject to the restrictions of their parent partition (e.g. 24 hour walltime limit on day or 2-day walltime limit on gpu ). If your jobs need to exceed those restrictions, please have your instructor or TF contact us. Course members are welcome to use the public partitions of the cluster. However, we request that students be respectful in their usage as not to disrupt ongoing research work.","title":"Node Reservations"},{"location":"clusters-at-yale/access/courses/#interactive-jobs","text":"salloc -p day --reservation=courseid or if the reservation is for GPU nodes salloc -p gpu --gpus=1 --reservation=courseid","title":"Interactive Jobs"},{"location":"clusters-at-yale/access/courses/#batch-jobs","text":"Add the following to your submission script: #SBATCH --reservation=courseid or if the reservation is for GPU nodes #SBATCH -p gpu --gpus=1 --reservation=courseid","title":"Batch Jobs"},{"location":"clusters-at-yale/access/courses/#web-portal","text":"In any of the app submission forms, type the courseid into the \"Reservation\" field. For standard (non-gpu) nodes, select day in the \"Partition\" field. If your node reservation contains GPU-enabled nodes, select gpu . Any course-specific apps listed under the \"Courses\" dropdown will automatically send all submitted jobs to the reservation, if one exists.","title":"Web Portal"},{"location":"clusters-at-yale/access/courses/#cluster-maintenance","text":"Each cluster is inaccessible twice a year for a three day regularly scheduled maintenance. The maintenance schedule is published here . Please account for the cluster unavailability when developing course schedules and (for students) completing your assignments.","title":"Cluster Maintenance"},{"location":"clusters-at-yale/access/courses/#end-of-semester-course-deletion","text":"As mentioned above, all course allocations are temporary. All associated accounts and data will be removed one month after the last day of exams for that semester. If you would like to retain any data in your course account, please download it prior to the deletion date or, if applicable, submit a request to hpc@yale.edu to transfer the data into your research account. A reminder of the removal will be sent to the instructor to see if it needs to be delayed for any incompletes (for example). Students will not received a reminder. Instructors, if you would like to retain course materials for future semesters, please copy them off the cluster or to a research account.","title":"End of Semester Course Deletion"},{"location":"clusters-at-yale/access/courses/#transfer-data-to-research-account","text":"If you have a research account on the cluster, you can transfer any data you want to save from your course account to your research account. Warning Make sure there is sufficient free space in your research account storage to accomodate any data you are transferring from your course account using getquota . Login to the cluster using your course account either via Terminal or the Shell app in the OOD web portal. Grant your research account access to your course accounts directories (substitute in your courseid and netid in the example). # home directory setfacl -m u:netid:rX /home/courseid_netid # project directory on Grace and McCleary setfacl -m u:netid:rX /gpfs/gibbs/project/courseid/courseid_netid Log in as your research account. Check that you can access the above paths. Move to the transfer node with ssh transfer . If you are transferring a lot of data, open a tmux session so the transfer can continue if you disconnect from the cluster. Initiate a copy of the desired data using rsync . For example: mkdir /gpfs/gibbs/project/group/netid/my_course_data rsync -av /gpfs/gibbs/project/courseid/courseid_netid/mydata /gpfs/gibbs/project/group/netid/my_course_data","title":"Transfer Data to Research Account"},{"location":"clusters-at-yale/access/mfa/","text":"Multi-factor Authentication To improve security, access to McCleary and Milgram requires both a public key and multi-factor authentication (MFA). We use the same MFA (Duo) as is used elsewhere at Yale. To get set up with Duo, see these instructions. You will need upload your ssh public key to our site . For more info on how to use ssh, please see the SSH instructions . Once you've set up Duo and your key is registered, you can log in to the cluster. Use ssh to connect to your cluster of choice, and you will be prompted for a passcode or to select a notification option. We recommend choosing Duo Push (option 1). If you chose this option you should receive a notification on your phone. Once approved, you should be allowed to continue to log in. Note You can set up more than one phone for Duo. For example, you can set up your smartphone plus your office landline. That way, if you forget or lose your phone, you can still authenticate. For instructions on how to add additional phones go here . Connection Multiplexing and File Transfers with DUO MFA Some file transfer clients attempt new and sometimes multiple concurrent connections to transfer files for you. When this happens, you will be asked to Duo authenticate for each connection. SSH Config File On macOS and Linux-based systems setting up a config file lets you re-uses your authenticated sessions for command-line tools and tools that respect your ssh configuration. An example config file is shown below which enables SSH multiplexing ( ControlMaster ) by caching connections in a directory ( ControlPath ) for a period of time (2h, ControlPersist ). Host *.ycrc.yale.edu mccleary grace milgram User NETID # Uncomment below to enable X11 forwarding without `-Y` #ForwardX11 yes # To re-use your connections with multi-factor authentication ControlMaster auto ControlPath ~/.ssh/tmp/%h_%p_%r ControlPersist 2h Host mccleary grace milgram HostName %h.ycrc.yale.edu Warning For multiplexing to work, the ~/.ssh/tmp directory must exist. Create it with mkdir -p ~/.ssh/tmp CyberDuck CyberDuck's interface with MFA can be stream-lined with a few additional configuration steps. Under Cyberduck > Preferences > Transfers > General change the setting to \"Use browser connection\" instead of \"Open multiple connections\". When you connect type one of the following when prompted with a \"Partial authentication success\" window. \"push\" to receive a push notification to your smart phone (requires the Duo mobile app) \"sms\" to receive a verification passcode via text message \"phone\" to receive a phone call MobaXTerm MobaXTerm is able to cache MFA connections to reduce the frequency of push notifications. Under Settings > SSH > Advanced SSH settings set the ssh browser type to scp (enhanced speed) as seen here: MobaXTerm SSH Settings WinSCP Similarly, WinSCP can reuse existing SSH connections to reduce the frequency of push notifications. Under Options > Preferences > Background (under Transfer) and: Set Maximal number of transfers at the same time: to 1 Check the Use multiple connections for single transfer box Click OK to save settings Troubleshoot MFA If you are having problems initially registering Duo, please contact the Yale ITS Helpdesk . If you have successfully used MFA connect to a cluster before, but cannot now, first please check the following: Test MFA using http://access.yale.edu Verify that your ssh client is using the correct login node Verify you are attempting to connect from a Yale machine or via the proper VPN If all of this is true, please contact us . Include the following information (and anything else you think is helpful): Your netid Have you ever successfully used ssh and Duo to connect to a cluster? How long have you been having problems? Where are you trying to connect from? (fully qualified hostname/IP, if possible) Are you using a VPN? What is the error message you see?","title":"Multi-factor Authentication"},{"location":"clusters-at-yale/access/mfa/#multi-factor-authentication","text":"To improve security, access to McCleary and Milgram requires both a public key and multi-factor authentication (MFA). We use the same MFA (Duo) as is used elsewhere at Yale. To get set up with Duo, see these instructions. You will need upload your ssh public key to our site . For more info on how to use ssh, please see the SSH instructions . Once you've set up Duo and your key is registered, you can log in to the cluster. Use ssh to connect to your cluster of choice, and you will be prompted for a passcode or to select a notification option. We recommend choosing Duo Push (option 1). If you chose this option you should receive a notification on your phone. Once approved, you should be allowed to continue to log in. Note You can set up more than one phone for Duo. For example, you can set up your smartphone plus your office landline. That way, if you forget or lose your phone, you can still authenticate. For instructions on how to add additional phones go here .","title":"Multi-factor Authentication"},{"location":"clusters-at-yale/access/mfa/#connection-multiplexing-and-file-transfers-with-duo-mfa","text":"Some file transfer clients attempt new and sometimes multiple concurrent connections to transfer files for you. When this happens, you will be asked to Duo authenticate for each connection.","title":"Connection Multiplexing and File Transfers with DUO MFA"},{"location":"clusters-at-yale/access/mfa/#ssh-config-file","text":"On macOS and Linux-based systems setting up a config file lets you re-uses your authenticated sessions for command-line tools and tools that respect your ssh configuration. An example config file is shown below which enables SSH multiplexing ( ControlMaster ) by caching connections in a directory ( ControlPath ) for a period of time (2h, ControlPersist ). Host *.ycrc.yale.edu mccleary grace milgram User NETID # Uncomment below to enable X11 forwarding without `-Y` #ForwardX11 yes # To re-use your connections with multi-factor authentication ControlMaster auto ControlPath ~/.ssh/tmp/%h_%p_%r ControlPersist 2h Host mccleary grace milgram HostName %h.ycrc.yale.edu Warning For multiplexing to work, the ~/.ssh/tmp directory must exist. Create it with mkdir -p ~/.ssh/tmp","title":"SSH Config File"},{"location":"clusters-at-yale/access/mfa/#cyberduck","text":"CyberDuck's interface with MFA can be stream-lined with a few additional configuration steps. Under Cyberduck > Preferences > Transfers > General change the setting to \"Use browser connection\" instead of \"Open multiple connections\". When you connect type one of the following when prompted with a \"Partial authentication success\" window. \"push\" to receive a push notification to your smart phone (requires the Duo mobile app) \"sms\" to receive a verification passcode via text message \"phone\" to receive a phone call","title":"CyberDuck"},{"location":"clusters-at-yale/access/mfa/#mobaxterm","text":"MobaXTerm is able to cache MFA connections to reduce the frequency of push notifications. Under Settings > SSH > Advanced SSH settings set the ssh browser type to scp (enhanced speed) as seen here: MobaXTerm SSH Settings","title":"MobaXTerm"},{"location":"clusters-at-yale/access/mfa/#winscp","text":"Similarly, WinSCP can reuse existing SSH connections to reduce the frequency of push notifications. Under Options > Preferences > Background (under Transfer) and: Set Maximal number of transfers at the same time: to 1 Check the Use multiple connections for single transfer box Click OK to save settings","title":"WinSCP"},{"location":"clusters-at-yale/access/mfa/#troubleshoot-mfa","text":"If you are having problems initially registering Duo, please contact the Yale ITS Helpdesk . If you have successfully used MFA connect to a cluster before, but cannot now, first please check the following: Test MFA using http://access.yale.edu Verify that your ssh client is using the correct login node Verify you are attempting to connect from a Yale machine or via the proper VPN If all of this is true, please contact us . Include the following information (and anything else you think is helpful): Your netid Have you ever successfully used ssh and Duo to connect to a cluster? How long have you been having problems? Where are you trying to connect from? (fully qualified hostname/IP, if possible) Are you using a VPN? What is the error message you see?","title":"Troubleshoot MFA"},{"location":"clusters-at-yale/access/ood/","text":"Web Portal (Open OnDemand) Open OnDemand (OOD) is platform for accessing the clusters that only requires a web browser. This web-portal provides a shell, file browser, and graphical interface for certain apps (like Jupyter or MATLAB). Access If you access Open OnDemand installed on YCRC clusters from off campus, you will need to first connect to the Yale VPN . Open OnDemand is available on each cluster using your NetID credentials (CAS login). The Yale CAS login is configured with the DUO authentication. We recommend that you click \"Remember me for 90 days\" when you are prompted to choose an authentication menthod for DUO. This will simplified the login process. Cluster OOD site Grace ood-grace.ycrc.yale.edu McCleary ood-mccleary.ycrc.yale.edu Milgram ood-milgram.ycrc.yale.edu The above four URLs are also called cluster OOD URLs. They are available to any user with a research account (also called a lab account) on the clusters. Your research account is the same as your NetID. OOD for Courses Each course on the YCRC clusters has its own URL to access OOD on the cluster. The URL is unique to each course and is also called course OOD. Course OODs all follow the same naming convention: coursename.ycrc.yale.edu . 'courename' is an abbreviated name given to the course by YCRC. Students must use the course URL to log in to OOD. They will with their NetID to log in but work under their student account on the cluster while they are in OOD. Course OOD and cluster OOD have different URLs, even if they use the same physical machine. Student accounts can only log in to OOD through a course OOD URL, and a regular account (same as your NetID) can only log in through the cluster OOD URL. Warning If you only have a student account, but try to log in through the cluster OOD URL, you will get an error in the browser: Error -- can't find user for cpsc424_test Run 'nginx_stage --help' to see a full list of available command line options. Use the URL for your course OOD will resolve the problem. Additional information about course OOD can be found at academic support . The Dashboard On login you will see the OOD dashboard. Along the top are pull-down menus for various Apps, including File Managers, Job Composer, a Shell, a list of Interactive Apps, etc. File Browser The file browser is a graphical interface to manage, upload, and download files from the clusters. You can use the built-in file editor to view and edit files from your browser without having to download and upload scripts. You can also drag-and-drop to download and upload files and directories, and move files between directories using this interface. Customize Favorite Paths Users are allowed to customize favorite paths in the file manager. Using the scripts below to add, remove, and list customized paths: ood_add_path ood_remove_path ood_list_path When you run ood_add_path from a shell command line, it will prompt you to add one path at a time, until you type 'n' to discontinue. All the paths added by you will be shown in the OOD pull-down menu for the file manager, as well as the left pane when the file manager is opened. ood_remove_path allows you to remove any of the paths added by you and ood_list_path will list all the paths added by you. After you have customized the path configuration from a shell, go to the OOD dashbaord and click Develop -> Restart Web Server on the top menu bar to make the change effective immediately. Shell You can launch a traditional command-line interface to the cluster using the Shell pull-down menu. This opens a terminal in a web-browser that you can use in the exact same way as when logging into the cluster via SSH. This is a convenient way to access the clusters when you don't have access to an ssh client or do not have your ssh keys. Interactive Apps We have deployed a selection of common graphical programs as Interactive Apps on Open OneDemand. Currently, we have apps for Remote Desktop, MATLAB, Mathematica, RStudio Desktop, RStudio Server, and Jupyter Notebook, etc. Warning You are limited to 4 interactive app instances (of any type) at one time. Additional instances will be rejected until you delete older open instances. Closing the window does not terminate the interactive app job. To terminate the job, click the \"Delete\" button in your \"My Interactive Apps\" page in the web portal. Remote Desktop Occasionally, it is helpful to use a graphical interface to explore data or run certain programs. In the past your options were to use VNC or X11 forwarding . These tools can be complex to setup or suffer from reduced performance. The Remote Desktop app from OOD simplifies the configuration of a VNC desktop session on a compute node. The MATLAB, Mathematica, and RStudio Desktop Apps are special versions of this app. To get started choose Remote Desktop (or another desktop app) from the Interactive Apps menu on the dashboard. Use the form to request resources and decide what partition your job should run on. Use devel ( interactive on Milgram) or your lab's partition. Once you launch the job, you will be presented with a notification that your job has been queued. Depending on the resources requested, you may need to wait for a bit. When the job starts you will see the option to launch the Remote Desktop: Note you can share a view only link for your session if you would like to share your screen. After you click on Launch Remote Desktop, a standard desktop interface will open in a new tab. Copy/Paste In some browsers, you may have to use a special text box to copy and paste from the Remote Desktop App. Click the arrow on the left side of your window for a menu, then click the clipboard icon to get access to your Remote Desktop's clipboard. Jupyter One of the most common uses of Open OnDemand is the Jupyter interface for Python and R. You can choose either Jupyter Notebook or Jupyter Lab. By default, this app will try to launch Jupyter Notebook, unless the Start JupyterLab checkbox is selected. Make sure that you chose the right Conda environment for you from the drop-down menu. If you have not yet set one up, follow our instructions on how to create a new one. After specifying the required resources (number of CPUs/GPUs, amount of RAM, etc.), you can submit the job. When it launches you can open the standard Jupyter interface where you can start working with notebooks. Root directory The Jupyter root directory is set to your Home when started. Project and Scratch can be accessed via their respective symlinks in Home. If you want to access a directory that cannot be acessed through your home directory, for example Gibbs, you need to create a symlink to that directory in your home directory. ycrc_default The ycrc_default conda environment will be automatically built when you select it for the first time from Jupyter. You can also build your own Jupyter and make it available to OOD: module load miniconda conda create -n env_name jupyter jupyter-lab ycrc_conda_env.sh update Once created, ycrc_default will not be updated by OOD automatically. It must be updated by the user manually. To update ycrc_default , run the following command from a shell command line: module load miniconda conda update -n ycrc_default jupyter jupyter-lab RStudio Server Change User R Package Path To change the default path where packages installed by the user are stored, you need to add the following line of code in your $HOME/.bashrc : export R_LIBS_USER = path_to_your_local_r_packages Configure the Graphic Device When you plot in a RStudio session, you may encounter the following error: Error in RStudioGD () : Shadow graphics device error: r error 4 ( R code execution error ) In addition: Warning message: In grDevices:::png ( \"/tmp/RtmpcRxRaB/4v3450e3627g4432fa27f516348657267.png\" , : unable to open connection to X11 display '' To fix the problem, you need to configure your RStudio session to use Cairo for plotting. You can do it in your code as follows: options ( bitmapType = 'cairo' ) Alternatively, you can put the above code in .Rprofile in your home directory and the option will be picked up automatically. Clean RStudio If RStudio becomes slow to respond or completely stops responding, please stop the RStudio session and then run the following script at a shell command line: clean_rstudio.sh This will remove any temporary files created by RStudio and allow it to start anew. Troubleshoot OOD An OOD session is started and then completed immediately Check if your quota is full Reset your .bashrc and .bash_profile to their original contents (you can backup the startup files before resetting them. Add the changes back one at a time to see if one or more of the changes would affect OOD from starting properly) Remove the default module collection file $HOME/.lmod.d/default.cluster-rhel8 (cluster is one of the following: grace, mccleary) or $HOME/.lmod.d/default.milgram-rhel7 for Milgram. Remote Desktop (or MATLAB, Mathematica, etc) cannot be started properly Make sure there is no initialization left by conda init in your .bashrc . Clean it with sed -i.bak -ne '/# >>> conda init/,/# <<< conda init/!p' ~/.bashrc Run dbus-launch and make sure you see the following output: [ pl543@grace1 ~ ] $ which dbus-launch /usr/bin/dbus-launch Jupyter cannot be started properly If you are trying to launch jupyter-notebook , make sure it is available in your jupyter conda environment: ( ycrc_default )[ pl543@grace1 ~ ] $ which jupyter-notebook /gpfs/gibbs/project/support/pl543/conda_envs/ycrc_default/bin/jupyter-notebook If you are trying to launch jupyter-lab , make sure it is available in your jupyter conda environment: ( ycrc_default )[ pl543@grace1 ~ ] $ which jupyter-lab /gpfs/gibbs/project/support/pl543/conda_envs/ycrc_default/bin/jupyter-notebook RStudio with Conda R If you see NOT_FOUND in \"Conda R Environment\", it means your Conda R environment has not been properly installed. You may need to reinstall your Conda R environment and make sure r-base r-essentials are both included. RStudio Server does not respond If you encounter a grey screen after clicking the \"Connect to RStudio Server\" button, please stop the RStudio session and run clean-rstudio.sh at a shell command line.","title":"Web Portal (Open OnDemand)"},{"location":"clusters-at-yale/access/ood/#web-portal-open-ondemand","text":"Open OnDemand (OOD) is platform for accessing the clusters that only requires a web browser. This web-portal provides a shell, file browser, and graphical interface for certain apps (like Jupyter or MATLAB).","title":"Web Portal (Open OnDemand)"},{"location":"clusters-at-yale/access/ood/#access","text":"If you access Open OnDemand installed on YCRC clusters from off campus, you will need to first connect to the Yale VPN . Open OnDemand is available on each cluster using your NetID credentials (CAS login). The Yale CAS login is configured with the DUO authentication. We recommend that you click \"Remember me for 90 days\" when you are prompted to choose an authentication menthod for DUO. This will simplified the login process. Cluster OOD site Grace ood-grace.ycrc.yale.edu McCleary ood-mccleary.ycrc.yale.edu Milgram ood-milgram.ycrc.yale.edu The above four URLs are also called cluster OOD URLs. They are available to any user with a research account (also called a lab account) on the clusters. Your research account is the same as your NetID.","title":"Access"},{"location":"clusters-at-yale/access/ood/#ood-for-courses","text":"Each course on the YCRC clusters has its own URL to access OOD on the cluster. The URL is unique to each course and is also called course OOD. Course OODs all follow the same naming convention: coursename.ycrc.yale.edu . 'courename' is an abbreviated name given to the course by YCRC. Students must use the course URL to log in to OOD. They will with their NetID to log in but work under their student account on the cluster while they are in OOD. Course OOD and cluster OOD have different URLs, even if they use the same physical machine. Student accounts can only log in to OOD through a course OOD URL, and a regular account (same as your NetID) can only log in through the cluster OOD URL. Warning If you only have a student account, but try to log in through the cluster OOD URL, you will get an error in the browser: Error -- can't find user for cpsc424_test Run 'nginx_stage --help' to see a full list of available command line options. Use the URL for your course OOD will resolve the problem. Additional information about course OOD can be found at academic support .","title":"OOD for Courses"},{"location":"clusters-at-yale/access/ood/#the-dashboard","text":"On login you will see the OOD dashboard. Along the top are pull-down menus for various Apps, including File Managers, Job Composer, a Shell, a list of Interactive Apps, etc.","title":"The Dashboard"},{"location":"clusters-at-yale/access/ood/#file-browser","text":"The file browser is a graphical interface to manage, upload, and download files from the clusters. You can use the built-in file editor to view and edit files from your browser without having to download and upload scripts. You can also drag-and-drop to download and upload files and directories, and move files between directories using this interface.","title":"File Browser"},{"location":"clusters-at-yale/access/ood/#customize-favorite-paths","text":"Users are allowed to customize favorite paths in the file manager. Using the scripts below to add, remove, and list customized paths: ood_add_path ood_remove_path ood_list_path When you run ood_add_path from a shell command line, it will prompt you to add one path at a time, until you type 'n' to discontinue. All the paths added by you will be shown in the OOD pull-down menu for the file manager, as well as the left pane when the file manager is opened. ood_remove_path allows you to remove any of the paths added by you and ood_list_path will list all the paths added by you. After you have customized the path configuration from a shell, go to the OOD dashbaord and click Develop -> Restart Web Server on the top menu bar to make the change effective immediately.","title":"Customize Favorite Paths"},{"location":"clusters-at-yale/access/ood/#shell","text":"You can launch a traditional command-line interface to the cluster using the Shell pull-down menu. This opens a terminal in a web-browser that you can use in the exact same way as when logging into the cluster via SSH. This is a convenient way to access the clusters when you don't have access to an ssh client or do not have your ssh keys.","title":"Shell"},{"location":"clusters-at-yale/access/ood/#interactive-apps","text":"We have deployed a selection of common graphical programs as Interactive Apps on Open OneDemand. Currently, we have apps for Remote Desktop, MATLAB, Mathematica, RStudio Desktop, RStudio Server, and Jupyter Notebook, etc. Warning You are limited to 4 interactive app instances (of any type) at one time. Additional instances will be rejected until you delete older open instances. Closing the window does not terminate the interactive app job. To terminate the job, click the \"Delete\" button in your \"My Interactive Apps\" page in the web portal.","title":"Interactive Apps"},{"location":"clusters-at-yale/access/ood/#remote-desktop","text":"Occasionally, it is helpful to use a graphical interface to explore data or run certain programs. In the past your options were to use VNC or X11 forwarding . These tools can be complex to setup or suffer from reduced performance. The Remote Desktop app from OOD simplifies the configuration of a VNC desktop session on a compute node. The MATLAB, Mathematica, and RStudio Desktop Apps are special versions of this app. To get started choose Remote Desktop (or another desktop app) from the Interactive Apps menu on the dashboard. Use the form to request resources and decide what partition your job should run on. Use devel ( interactive on Milgram) or your lab's partition. Once you launch the job, you will be presented with a notification that your job has been queued. Depending on the resources requested, you may need to wait for a bit. When the job starts you will see the option to launch the Remote Desktop: Note you can share a view only link for your session if you would like to share your screen. After you click on Launch Remote Desktop, a standard desktop interface will open in a new tab.","title":"Remote Desktop"},{"location":"clusters-at-yale/access/ood/#copypaste","text":"In some browsers, you may have to use a special text box to copy and paste from the Remote Desktop App. Click the arrow on the left side of your window for a menu, then click the clipboard icon to get access to your Remote Desktop's clipboard.","title":"Copy/Paste"},{"location":"clusters-at-yale/access/ood/#jupyter","text":"One of the most common uses of Open OnDemand is the Jupyter interface for Python and R. You can choose either Jupyter Notebook or Jupyter Lab. By default, this app will try to launch Jupyter Notebook, unless the Start JupyterLab checkbox is selected. Make sure that you chose the right Conda environment for you from the drop-down menu. If you have not yet set one up, follow our instructions on how to create a new one. After specifying the required resources (number of CPUs/GPUs, amount of RAM, etc.), you can submit the job. When it launches you can open the standard Jupyter interface where you can start working with notebooks.","title":"Jupyter"},{"location":"clusters-at-yale/access/ood/#root-directory","text":"The Jupyter root directory is set to your Home when started. Project and Scratch can be accessed via their respective symlinks in Home. If you want to access a directory that cannot be acessed through your home directory, for example Gibbs, you need to create a symlink to that directory in your home directory.","title":"Root directory"},{"location":"clusters-at-yale/access/ood/#ycrc_default","text":"The ycrc_default conda environment will be automatically built when you select it for the first time from Jupyter. You can also build your own Jupyter and make it available to OOD: module load miniconda conda create -n env_name jupyter jupyter-lab ycrc_conda_env.sh update Once created, ycrc_default will not be updated by OOD automatically. It must be updated by the user manually. To update ycrc_default , run the following command from a shell command line: module load miniconda conda update -n ycrc_default jupyter jupyter-lab","title":"ycrc_default"},{"location":"clusters-at-yale/access/ood/#rstudio-server","text":"","title":"RStudio Server"},{"location":"clusters-at-yale/access/ood/#change-user-r-package-path","text":"To change the default path where packages installed by the user are stored, you need to add the following line of code in your $HOME/.bashrc : export R_LIBS_USER = path_to_your_local_r_packages","title":"Change User R Package Path"},{"location":"clusters-at-yale/access/ood/#configure-the-graphic-device","text":"When you plot in a RStudio session, you may encounter the following error: Error in RStudioGD () : Shadow graphics device error: r error 4 ( R code execution error ) In addition: Warning message: In grDevices:::png ( \"/tmp/RtmpcRxRaB/4v3450e3627g4432fa27f516348657267.png\" , : unable to open connection to X11 display '' To fix the problem, you need to configure your RStudio session to use Cairo for plotting. You can do it in your code as follows: options ( bitmapType = 'cairo' ) Alternatively, you can put the above code in .Rprofile in your home directory and the option will be picked up automatically.","title":"Configure the Graphic Device"},{"location":"clusters-at-yale/access/ood/#clean-rstudio","text":"If RStudio becomes slow to respond or completely stops responding, please stop the RStudio session and then run the following script at a shell command line: clean_rstudio.sh This will remove any temporary files created by RStudio and allow it to start anew.","title":"Clean RStudio"},{"location":"clusters-at-yale/access/ood/#troubleshoot-ood","text":"","title":"Troubleshoot OOD"},{"location":"clusters-at-yale/access/ood/#an-ood-session-is-started-and-then-completed-immediately","text":"Check if your quota is full Reset your .bashrc and .bash_profile to their original contents (you can backup the startup files before resetting them. Add the changes back one at a time to see if one or more of the changes would affect OOD from starting properly) Remove the default module collection file $HOME/.lmod.d/default.cluster-rhel8 (cluster is one of the following: grace, mccleary) or $HOME/.lmod.d/default.milgram-rhel7 for Milgram.","title":"An OOD session is started and then completed immediately"},{"location":"clusters-at-yale/access/ood/#remote-desktop-or-matlab-mathematica-etc-cannot-be-started-properly","text":"Make sure there is no initialization left by conda init in your .bashrc . Clean it with sed -i.bak -ne '/# >>> conda init/,/# <<< conda init/!p' ~/.bashrc Run dbus-launch and make sure you see the following output: [ pl543@grace1 ~ ] $ which dbus-launch /usr/bin/dbus-launch","title":"Remote Desktop (or MATLAB, Mathematica, etc) cannot be started properly"},{"location":"clusters-at-yale/access/ood/#jupyter-cannot-be-started-properly","text":"If you are trying to launch jupyter-notebook , make sure it is available in your jupyter conda environment: ( ycrc_default )[ pl543@grace1 ~ ] $ which jupyter-notebook /gpfs/gibbs/project/support/pl543/conda_envs/ycrc_default/bin/jupyter-notebook If you are trying to launch jupyter-lab , make sure it is available in your jupyter conda environment: ( ycrc_default )[ pl543@grace1 ~ ] $ which jupyter-lab /gpfs/gibbs/project/support/pl543/conda_envs/ycrc_default/bin/jupyter-notebook","title":"Jupyter cannot be started properly"},{"location":"clusters-at-yale/access/ood/#rstudio-with-conda-r","text":"If you see NOT_FOUND in \"Conda R Environment\", it means your Conda R environment has not been properly installed. You may need to reinstall your Conda R environment and make sure r-base r-essentials are both included.","title":"RStudio with Conda R"},{"location":"clusters-at-yale/access/ood/#rstudio-server-does-not-respond","text":"If you encounter a grey screen after clicking the \"Connect to RStudio Server\" button, please stop the RStudio session and run clean-rstudio.sh at a shell command line.","title":"RStudio Server does not respond"},{"location":"clusters-at-yale/access/ssh/","text":"SSH Connection For more advanced use cases that are not well supported by the Web Portal (Open OnDemand) , you can connect to the cluster over the more traditional SSH connection. Overview Request an account (if you do not already have one). Send us your public SSH key with our SSH key uploader . Allow up to ten minutes for it to propagate. Once we have your public key you can connect with ssh netid@clustername.ycrc.yale.edu . Login node addresses and other details of the clusters, such as scheduler partitions and storage, can be found on the clusters page . To use graphical programs on the clusters, please see our guides on Open OnDemand or X11 Forwarding . If you are having trouble logging in : please read the rest of this page and our Troubleshoot Login page, then contact us if you're still having issues. What are SSH keys SSH (Secure Shell) keys are a set of two pieces of information that you use to identify yourself and encrypt communication to and from a server. Usually this takes the form of two files: a public key (often saved as id_rsa.pub ) and a private key ( id_rsa or id_rsa.ppk ). To use an analogy, your public key is like a lock and your private key is what unlocks it. It is ok for others to see the lock (public key), but anyone who knows the private key can open your lock (and impersonate you). When you connect to a remote server in order to sign in, it will present your lock. You prove your identity by unlocking it with your secret key. As you continue communicating with the remote server, the data sent to you is also locked with your public key such that only you can unlock it with your private key. We use an automated system to distribute your public key onto the clusters, which you can log in to here . It is only accessible on campus or through the Yale VPN . All the public keys that are authorized to your account are stored in the file ~/.ssh/authorized_keys on the clusters you have been given access to. If you use multiple computers, you can either keep the same ssh key pair on every one or have a different set for each. Having only one is less complicated, but if your key pair is compromised you have to be worried about everywhere it is authorized. Warning Keep your private keys private! Anyone who has them can assume your identity on any server where your keys are authorized. We will never ask for your private key . For further reading we recommend starting with the Wikipedia articles about public-key cryptography and challenge-response authentication . macOS and Linux Generate Your Key Pair on macOS and Linux To generate a new key pair, first open a terminal/xterm session. If you are on macOS, open Applications -> Utilities -> Terminal. Generate your public and private ssh keys. Type the following into the terminal window: ssh-keygen Your terminal should respond: Generating public/private rsa key pair. Enter file in which to save the key (/home/yourusername/.ssh/id_rsa): Press Enter to accept the default value. Your terminal should respond: Enter passphrase (empty for no passphrase): Choose a secure passphrase. Your passphrase will prevent access to your account in the event your private key is stolen. You will not see any characters appear on the screen as you type. The response will be: Enter same passphrase again: Enter the passphrase again. The key pair is generated and written to a directory called .ssh in your home directory. The public key is stored in ~/.ssh/id_rsa.pub . If you forget your passphrase, it cannot be recovered. Instead, you will need to generate and upload a new SSH key pair. Next, upload your public SSH key on the cluster. Run the following command in a terminal: cat ~/.ssh/id_rsa.pub Copy and paste the output to our SSH key uploader . Note: It can take a few minutes for newly uploaded keys to sync out to the clusters so your login may not work immediately. Connect on macOS and Linux Once your key has been copied to the appropriate places on the clusters, you can log in with the command: ssh netid@clustername.ycrc.yale.edu Check out our Advanced SSH Configuration for tips on maintaining connections and adding tab complete to your ssh commands on linux/macOS. Windows We recommend using the Web Portal (Open OnDemand) to connect to the clusters from Windows. If you need advanced features beyond the web portal, we recommend using MobaXterm . MobaXterm You can download, extract & install MobaXterm from this page . We recommend using the \"Installer Edition\", but make sure to extract the zip file before running the installer. You can also use one of the Windows Subsystem for Linux (WSL) distributions and follow the Linux instructions above. However, you will probably run into issues if you try to use any graphical applications. Generate Your Key Pair on Windows First, generate an SSH key pair if you haven't already: Open MobaXterm. From the top menu choose Tools -> MobaKeyGen (SSH key generator). Leave all defaults and click the \"Generate\" button. Wiggle your mouse. Click \"Save public key\" and save your public key as id_rsa.pub. Choose a secure passphrase and enter into the two relevant fields. Your passphrase will prevent access to your account in the event your private key is stolen. Click \"Save private key\" and save your private key as id_rsa.ppk (this one is secret, don't give it to other people ). Copy the text of your public key and paste it into the text box in our SSH key uploader . Your key will be synced out to the clusters in a few minutes. Connect with MobaXterm To make a new connection to one of the clusters: Open MobaXterm. From the top menu select Sessions -> New Session. Click the SSH icon in the top left. Enter the cluster login node address (e.g. grace.ycrc.yale.edu) as the Remote Host. Check \"Specify Username\" and Enter your netID as the the username. Click the \"Advanced SSH Settings\" tab and check the \"Use private key box\", then click the file icon / magnifying glass to choose where you saved your private key (id_rsa.ppk). Click OK. In the future, your session should be saved in the sessions bar on the left in the main window.","title":"Connect with SSH"},{"location":"clusters-at-yale/access/ssh/#ssh-connection","text":"For more advanced use cases that are not well supported by the Web Portal (Open OnDemand) , you can connect to the cluster over the more traditional SSH connection.","title":"SSH Connection"},{"location":"clusters-at-yale/access/ssh/#overview","text":"Request an account (if you do not already have one). Send us your public SSH key with our SSH key uploader . Allow up to ten minutes for it to propagate. Once we have your public key you can connect with ssh netid@clustername.ycrc.yale.edu . Login node addresses and other details of the clusters, such as scheduler partitions and storage, can be found on the clusters page . To use graphical programs on the clusters, please see our guides on Open OnDemand or X11 Forwarding . If you are having trouble logging in : please read the rest of this page and our Troubleshoot Login page, then contact us if you're still having issues.","title":"Overview"},{"location":"clusters-at-yale/access/ssh/#what-are-ssh-keys","text":"SSH (Secure Shell) keys are a set of two pieces of information that you use to identify yourself and encrypt communication to and from a server. Usually this takes the form of two files: a public key (often saved as id_rsa.pub ) and a private key ( id_rsa or id_rsa.ppk ). To use an analogy, your public key is like a lock and your private key is what unlocks it. It is ok for others to see the lock (public key), but anyone who knows the private key can open your lock (and impersonate you). When you connect to a remote server in order to sign in, it will present your lock. You prove your identity by unlocking it with your secret key. As you continue communicating with the remote server, the data sent to you is also locked with your public key such that only you can unlock it with your private key. We use an automated system to distribute your public key onto the clusters, which you can log in to here . It is only accessible on campus or through the Yale VPN . All the public keys that are authorized to your account are stored in the file ~/.ssh/authorized_keys on the clusters you have been given access to. If you use multiple computers, you can either keep the same ssh key pair on every one or have a different set for each. Having only one is less complicated, but if your key pair is compromised you have to be worried about everywhere it is authorized. Warning Keep your private keys private! Anyone who has them can assume your identity on any server where your keys are authorized. We will never ask for your private key . For further reading we recommend starting with the Wikipedia articles about public-key cryptography and challenge-response authentication .","title":"What are SSH keys"},{"location":"clusters-at-yale/access/ssh/#macos-and-linux","text":"","title":"macOS and Linux"},{"location":"clusters-at-yale/access/ssh/#generate-your-key-pair-on-macos-and-linux","text":"To generate a new key pair, first open a terminal/xterm session. If you are on macOS, open Applications -> Utilities -> Terminal. Generate your public and private ssh keys. Type the following into the terminal window: ssh-keygen Your terminal should respond: Generating public/private rsa key pair. Enter file in which to save the key (/home/yourusername/.ssh/id_rsa): Press Enter to accept the default value. Your terminal should respond: Enter passphrase (empty for no passphrase): Choose a secure passphrase. Your passphrase will prevent access to your account in the event your private key is stolen. You will not see any characters appear on the screen as you type. The response will be: Enter same passphrase again: Enter the passphrase again. The key pair is generated and written to a directory called .ssh in your home directory. The public key is stored in ~/.ssh/id_rsa.pub . If you forget your passphrase, it cannot be recovered. Instead, you will need to generate and upload a new SSH key pair. Next, upload your public SSH key on the cluster. Run the following command in a terminal: cat ~/.ssh/id_rsa.pub Copy and paste the output to our SSH key uploader . Note: It can take a few minutes for newly uploaded keys to sync out to the clusters so your login may not work immediately.","title":"Generate Your Key Pair on macOS and Linux"},{"location":"clusters-at-yale/access/ssh/#connect-on-macos-and-linux","text":"Once your key has been copied to the appropriate places on the clusters, you can log in with the command: ssh netid@clustername.ycrc.yale.edu Check out our Advanced SSH Configuration for tips on maintaining connections and adding tab complete to your ssh commands on linux/macOS.","title":"Connect on macOS and Linux"},{"location":"clusters-at-yale/access/ssh/#windows","text":"We recommend using the Web Portal (Open OnDemand) to connect to the clusters from Windows. If you need advanced features beyond the web portal, we recommend using MobaXterm .","title":"Windows"},{"location":"clusters-at-yale/access/ssh/#mobaxterm","text":"You can download, extract & install MobaXterm from this page . We recommend using the \"Installer Edition\", but make sure to extract the zip file before running the installer. You can also use one of the Windows Subsystem for Linux (WSL) distributions and follow the Linux instructions above. However, you will probably run into issues if you try to use any graphical applications.","title":"MobaXterm"},{"location":"clusters-at-yale/access/ssh/#generate-your-key-pair-on-windows","text":"First, generate an SSH key pair if you haven't already: Open MobaXterm. From the top menu choose Tools -> MobaKeyGen (SSH key generator). Leave all defaults and click the \"Generate\" button. Wiggle your mouse. Click \"Save public key\" and save your public key as id_rsa.pub. Choose a secure passphrase and enter into the two relevant fields. Your passphrase will prevent access to your account in the event your private key is stolen. Click \"Save private key\" and save your private key as id_rsa.ppk (this one is secret, don't give it to other people ). Copy the text of your public key and paste it into the text box in our SSH key uploader . Your key will be synced out to the clusters in a few minutes.","title":"Generate Your Key Pair on Windows"},{"location":"clusters-at-yale/access/ssh/#connect-with-mobaxterm","text":"To make a new connection to one of the clusters: Open MobaXterm. From the top menu select Sessions -> New Session. Click the SSH icon in the top left. Enter the cluster login node address (e.g. grace.ycrc.yale.edu) as the Remote Host. Check \"Specify Username\" and Enter your netID as the the username. Click the \"Advanced SSH Settings\" tab and check the \"Use private key box\", then click the file icon / magnifying glass to choose where you saved your private key (id_rsa.ppk). Click OK. In the future, your session should be saved in the sessions bar on the left in the main window.","title":"Connect with MobaXterm"},{"location":"clusters-at-yale/access/vnc/","text":"VNC As an alternative to X11 Forwarding, using VNC to access the cluster is another way to run graphically intensive applications. Open OnDemand On the clusters, we have web dashboards set up that can run VNC for you as a job and forward your session back to you via your browser using Open OnDemand . To use the Remote Desktop tab, browse under the \"interactive apps\" drop-down menu item. We strongly encourage using Open OnDemand unless you have specific requirements otherwise. Setup vncserver on a Cluster Connect to the cluster with X11 forwarding enabled. If on Linux or Mac, ssh -Y netid@cluster , or if on Windows, follow our X11 forwarding guide . Start an interactive job on cluster with the --x11 flag (see Slurm for more information). For this description, we\u2019ll assume you were given node r801u30n01: salloc --x11 On that node, run the VNCserver. You\u2019ll see something like: r801u30n01.grace$ vncserver New 'r801u30n01.grace.ycrc.yale.edu:1 (kln26)' desktop is r801u30n01.grace.ycrc.yale.edu:1 Creating default startup script /home/kln26/.vnc/xstartup Starting applications specified in /home/kln26/.vnc/xstartup Log file is /home/kln26/.vnc/r801u30n01.grace.ycrc.yale.edu:1.log The :1 means that your DISPLAY is :1. You\u2019ll need that later, so note it. The first time you run \"vncserver\", you\u2019ll also be asked to select a password for allowing access. On MacOS, if connecting with TurboVNC throws a security exception such as \"javax.net.ssl.SSLHandshakeException\", try adding the SecurityTypes option when starting vncserver on the cluster: vncserver -SecurityTypes VNC,OTP,UnixLogin,None Connect from your local machine (laptop/desktop) macOs/Linux From a shell on your local machine, run the following ssh command: ssh -Y -L7777:r801u30n01:5901 YourNetID@cluster_login_node This will set up a tunnel from your local port 7777 to port 5901 on r801u30n01. You will need to customize this command to your situation. The 5901 is for display :1. In general, you should put 5900+DISPLAY. The 7777 is arbitrary; any number above 3000 will likely work. You\u2019ll need the number you chose for the next step. On your local machine, start the vncviewer application. Depending on your local operating system, you may need to install this. We recommend TurboVNC for Mac. When you start the viewer, you\u2019ll need to tell it which host and port to attach to. You want to specify the local end of the tunnel. In the above case, that would be localhost::7777. Exactly how you specify this will depend a bit on which viewer you use. E.g: vncviewer localhost::7777 You should be prompted for the password you set when you started the server. Now you are in a GUI environment and can run IGV or any other rich GUI application. /home/bioinfo/software/IGV/IGV_2.2.0/igv.sh Windows In MobaXterm, create a new Session (available in the menu bar) and then select the VNC session. To fill out the VNC Session setup, click the \"Network settings\" tab and check the box for \"Connect through SSH gateway (jump host). Then fill out the boxes as follows: Remote hostname or IP Address: name of the node running your VNC server (e.g. r801u30n01) Port: 5900 + the DISPLAY number from above (e.g. 5901 for DISPLAY = 1 ) Gateway SSH server: ssh address of the cluster (e.g. grace.ycrc.yale.edu) Port: 22 (should be default) User: netid Use private key: check this box and click to point to your private key file you use to connect to the cluster When you are done, click OK. If promoted for a password for \"localhost\", provide the vncserver password you specified in the previous step. If the VNC server looks very pixelated and your mouse movements seem laggy, try clicking the \"Toggle scaling\" button at the top of the VNC window. Example Configuration: Clean Up When you are all finished, you can kill the vncserver by doing this in the same shell you used to start it (replace :1 by your display number): vncserver -kill :1","title":"VNC"},{"location":"clusters-at-yale/access/vnc/#vnc","text":"As an alternative to X11 Forwarding, using VNC to access the cluster is another way to run graphically intensive applications.","title":"VNC"},{"location":"clusters-at-yale/access/vnc/#open-ondemand","text":"On the clusters, we have web dashboards set up that can run VNC for you as a job and forward your session back to you via your browser using Open OnDemand . To use the Remote Desktop tab, browse under the \"interactive apps\" drop-down menu item. We strongly encourage using Open OnDemand unless you have specific requirements otherwise.","title":"Open OnDemand"},{"location":"clusters-at-yale/access/vnc/#setup-vncserver-on-a-cluster","text":"Connect to the cluster with X11 forwarding enabled. If on Linux or Mac, ssh -Y netid@cluster , or if on Windows, follow our X11 forwarding guide . Start an interactive job on cluster with the --x11 flag (see Slurm for more information). For this description, we\u2019ll assume you were given node r801u30n01: salloc --x11 On that node, run the VNCserver. You\u2019ll see something like: r801u30n01.grace$ vncserver New 'r801u30n01.grace.ycrc.yale.edu:1 (kln26)' desktop is r801u30n01.grace.ycrc.yale.edu:1 Creating default startup script /home/kln26/.vnc/xstartup Starting applications specified in /home/kln26/.vnc/xstartup Log file is /home/kln26/.vnc/r801u30n01.grace.ycrc.yale.edu:1.log The :1 means that your DISPLAY is :1. You\u2019ll need that later, so note it. The first time you run \"vncserver\", you\u2019ll also be asked to select a password for allowing access. On MacOS, if connecting with TurboVNC throws a security exception such as \"javax.net.ssl.SSLHandshakeException\", try adding the SecurityTypes option when starting vncserver on the cluster: vncserver -SecurityTypes VNC,OTP,UnixLogin,None","title":"Setup vncserver on a Cluster"},{"location":"clusters-at-yale/access/vnc/#connect-from-your-local-machine-laptopdesktop","text":"","title":"Connect from your local machine (laptop/desktop)"},{"location":"clusters-at-yale/access/vnc/#macoslinux","text":"From a shell on your local machine, run the following ssh command: ssh -Y -L7777:r801u30n01:5901 YourNetID@cluster_login_node This will set up a tunnel from your local port 7777 to port 5901 on r801u30n01. You will need to customize this command to your situation. The 5901 is for display :1. In general, you should put 5900+DISPLAY. The 7777 is arbitrary; any number above 3000 will likely work. You\u2019ll need the number you chose for the next step. On your local machine, start the vncviewer application. Depending on your local operating system, you may need to install this. We recommend TurboVNC for Mac. When you start the viewer, you\u2019ll need to tell it which host and port to attach to. You want to specify the local end of the tunnel. In the above case, that would be localhost::7777. Exactly how you specify this will depend a bit on which viewer you use. E.g: vncviewer localhost::7777 You should be prompted for the password you set when you started the server. Now you are in a GUI environment and can run IGV or any other rich GUI application. /home/bioinfo/software/IGV/IGV_2.2.0/igv.sh","title":"macOs/Linux"},{"location":"clusters-at-yale/access/vnc/#windows","text":"In MobaXterm, create a new Session (available in the menu bar) and then select the VNC session. To fill out the VNC Session setup, click the \"Network settings\" tab and check the box for \"Connect through SSH gateway (jump host). Then fill out the boxes as follows: Remote hostname or IP Address: name of the node running your VNC server (e.g. r801u30n01) Port: 5900 + the DISPLAY number from above (e.g. 5901 for DISPLAY = 1 ) Gateway SSH server: ssh address of the cluster (e.g. grace.ycrc.yale.edu) Port: 22 (should be default) User: netid Use private key: check this box and click to point to your private key file you use to connect to the cluster When you are done, click OK. If promoted for a password for \"localhost\", provide the vncserver password you specified in the previous step. If the VNC server looks very pixelated and your mouse movements seem laggy, try clicking the \"Toggle scaling\" button at the top of the VNC window. Example Configuration:","title":"Windows"},{"location":"clusters-at-yale/access/vnc/#clean-up","text":"When you are all finished, you can kill the vncserver by doing this in the same shell you used to start it (replace :1 by your display number): vncserver -kill :1","title":"Clean Up"},{"location":"clusters-at-yale/access/vpn/","text":"Access from Off Campus (VPN) Yale's clusters can only be accessed on the Yale network. Therefore, in order to access a cluster from off campus, you will need to first connect to Yale's VPN. More information about Yale's VPN can be found on the ITS website . VPN Software Windows and macOS We recommend the Cisco AnyConnect VPN Client, which can be downloaded from the ITS Software Library . Linux On Linux, you can use openconnect to connect to one of Yale's VPNs. If you are using the standard Gnome-based distros, use the commands below to install. Ubuntu/Debian sudo apt install network-manager-openconnect-gnome Fedora/CentOS sudo yum install NetworkManager-openconnect Connect via VPN You will need to connect via the VPN client using the profile \"access.yale.edu\". Multi-factor Authentication (MFA) Authentication for the VPN requires multi-factor authentication via Duo in addition to your normal Yale credentials (email address and netid password). After you select \"Connect\" in the above dialog box, it will launch a web page with a prompt to login with your email address, netid password and MFA method. You can click \"Other options\" to choose your authentication method. If you choose \"Duo Push\", simply tap \"Approve\" on your mobile device. If you choose \"Duo Mobile passcode\", enter the passcode from the Duo Mobile app. If you choose \"Phone call\", follow the prompts when you are called. Once you successfully authenticate with MFA, you will be connected to the VPN and should be able to log in the clusters via SSH and Open OnDemand as usual. More information about MFA at Yale can be found on the ITS website .","title":"Access from Off Campus (VPN)"},{"location":"clusters-at-yale/access/vpn/#access-from-off-campus-vpn","text":"Yale's clusters can only be accessed on the Yale network. Therefore, in order to access a cluster from off campus, you will need to first connect to Yale's VPN. More information about Yale's VPN can be found on the ITS website .","title":"Access from Off Campus (VPN)"},{"location":"clusters-at-yale/access/vpn/#vpn-software","text":"","title":"VPN Software"},{"location":"clusters-at-yale/access/vpn/#windows-and-macos","text":"We recommend the Cisco AnyConnect VPN Client, which can be downloaded from the ITS Software Library .","title":"Windows and macOS"},{"location":"clusters-at-yale/access/vpn/#linux","text":"On Linux, you can use openconnect to connect to one of Yale's VPNs. If you are using the standard Gnome-based distros, use the commands below to install. Ubuntu/Debian sudo apt install network-manager-openconnect-gnome Fedora/CentOS sudo yum install NetworkManager-openconnect","title":"Linux"},{"location":"clusters-at-yale/access/vpn/#connect-via-vpn","text":"You will need to connect via the VPN client using the profile \"access.yale.edu\".","title":"Connect via VPN"},{"location":"clusters-at-yale/access/vpn/#multi-factor-authentication-mfa","text":"Authentication for the VPN requires multi-factor authentication via Duo in addition to your normal Yale credentials (email address and netid password). After you select \"Connect\" in the above dialog box, it will launch a web page with a prompt to login with your email address, netid password and MFA method. You can click \"Other options\" to choose your authentication method. If you choose \"Duo Push\", simply tap \"Approve\" on your mobile device. If you choose \"Duo Mobile passcode\", enter the passcode from the Duo Mobile app. If you choose \"Phone call\", follow the prompts when you are called. Once you successfully authenticate with MFA, you will be connected to the VPN and should be able to log in the clusters via SSH and Open OnDemand as usual. More information about MFA at Yale can be found on the ITS website .","title":"Multi-factor Authentication (MFA)"},{"location":"clusters-at-yale/access/x11/","text":"Graphical Interfaces (X11) To use a graphical interface on the clusters and you choose not to use the web portal , your connection needs to be set up for X11 forwarding, which will transmit the graphical window from the cluster back to your local machine. A simple test to see if your setup is working is to run the command xclock . You should see a simple analog clock window pop up. On macOS Download and install the latest X-Quartz release. Log out and log back in to your Mac to reset some variables When using ssh to log in to the clusters, use the -Y option to enable X11 forwarding. Example: ssh -Y netid@grace.ycrc.yale.edu Note: if you get the error \"cannot open display\", please open an X-Quartz terminal and run the following command, and then log in to the cluster from the X-Quartz terminal: launchctl load -w /Library/LaunchAgents/org.macosforge.xquartz.startx.plist On Windows We recommend MobaXterm for connecting to the clusters from Windows. It is configured for X11 forwarding out of the box and should require no additional configuration or software. Quick Test A quick and simple test to check if X11 forwarding is working is to run the command xclock in the session you expect to be forwarding. After a short delay, you should see a window with a simple clock pop up. Submit an X11 enabled Job Once configured, you'll usually want to use X11 forwarding on a compute node to do your work. To allocate a simple interactive session with X11 forwarding: salloc --x11 For more Slurm options, see our Slurm documentation .","title":"Graphical Interfaces (X11)"},{"location":"clusters-at-yale/access/x11/#graphical-interfaces-x11","text":"To use a graphical interface on the clusters and you choose not to use the web portal , your connection needs to be set up for X11 forwarding, which will transmit the graphical window from the cluster back to your local machine. A simple test to see if your setup is working is to run the command xclock . You should see a simple analog clock window pop up.","title":"Graphical Interfaces (X11)"},{"location":"clusters-at-yale/access/x11/#on-macos","text":"Download and install the latest X-Quartz release. Log out and log back in to your Mac to reset some variables When using ssh to log in to the clusters, use the -Y option to enable X11 forwarding. Example: ssh -Y netid@grace.ycrc.yale.edu Note: if you get the error \"cannot open display\", please open an X-Quartz terminal and run the following command, and then log in to the cluster from the X-Quartz terminal: launchctl load -w /Library/LaunchAgents/org.macosforge.xquartz.startx.plist","title":"On macOS"},{"location":"clusters-at-yale/access/x11/#on-windows","text":"We recommend MobaXterm for connecting to the clusters from Windows. It is configured for X11 forwarding out of the box and should require no additional configuration or software.","title":"On Windows"},{"location":"clusters-at-yale/access/x11/#quick-test","text":"A quick and simple test to check if X11 forwarding is working is to run the command xclock in the session you expect to be forwarding. After a short delay, you should see a window with a simple clock pop up.","title":"Quick Test"},{"location":"clusters-at-yale/access/x11/#submit-an-x11-enabled-job","text":"Once configured, you'll usually want to use X11 forwarding on a compute node to do your work. To allocate a simple interactive session with X11 forwarding: salloc --x11 For more Slurm options, see our Slurm documentation .","title":"Submit an X11 enabled Job"},{"location":"clusters-at-yale/applications/","text":"Overview Software Modules The YCRC will install and manage commonly used software. These software are available as modules, which allow you to add or remove different combinations and versions of software to your environment as needed. See our module guide for more info. You can run module avail to page through all available software once you log in. Conda, Python & R You should also feel free to install things for yourself. See our Conda , Python , R guides for guidance on running these on the clusters. Compile Your Own Software For all other software, we encourage users to attempt to install their own software into their directories. Here are instructions for common software procedures. Make Cmake Apptainer : create containers and port Docker containers to the clusters (formerly know as \"Singularity\") If you run into issues with your software installations, contact us . Software Guides We provide additional guides for running specific software on the clusters as well.","title":"Overview"},{"location":"clusters-at-yale/applications/#overview","text":"","title":"Overview"},{"location":"clusters-at-yale/applications/#software-modules","text":"The YCRC will install and manage commonly used software. These software are available as modules, which allow you to add or remove different combinations and versions of software to your environment as needed. See our module guide for more info. You can run module avail to page through all available software once you log in.","title":"Software Modules"},{"location":"clusters-at-yale/applications/#conda-python-r","text":"You should also feel free to install things for yourself. See our Conda , Python , R guides for guidance on running these on the clusters.","title":"Conda, Python & R"},{"location":"clusters-at-yale/applications/#compile-your-own-software","text":"For all other software, we encourage users to attempt to install their own software into their directories. Here are instructions for common software procedures. Make Cmake Apptainer : create containers and port Docker containers to the clusters (formerly know as \"Singularity\") If you run into issues with your software installations, contact us .","title":"Compile Your Own Software"},{"location":"clusters-at-yale/applications/#software-guides","text":"We provide additional guides for running specific software on the clusters as well.","title":"Software Guides"},{"location":"clusters-at-yale/applications/compile/","text":"Build Software How to get software you need up and running on the clusters. caveat emptor We recommend either use existing software modules , Conda , Apptainer , or pre-compiled software where available. However, there are cases where compiling applications is necessary or desired. This can be because the pre-compiled version isn't readily available/compatible or because compiling applications on the cluster will make an appreciable difference in performance. It is also the case that many R packages are compiled at install time. When compiling applications on the clusters, it is important to consider the ways in which you expect to run the application you are endeavoring to get working. If you want to be able to run jobs calling your application any node on the cluster, you will need to target the oldest hardware available so that newer optimizations are not used that will fail on some nodes. If your application is already quite specialized (e.g. needs GPUs or brand-new CPU instructions), you will want to compile it natively for the subset of compute nodes on which your jobs will run. This decision is often a trade-off between faster individual jobs or jobs that can run on more nodes at once. Each of the cluster pages (see the HPC Resources page for a list) has a \"Compute Node Configurations\" section where nodes are roughly listed from oldest to newest. Illegal Instruction Instructions You may find that software compiled on newer compute nodes will fail with the error Illegal instruction (core dumped) . This includes R/Python libraries that include code that compiles from source. To remedy this issue make sure to always either: Build or install software on the oldest available nodes. You can ensure you are on the oldest hardware by specifying the oldest feature ( --constraint oldest ) in your job submission. Require that your jobs running the software in question request similar hardware to their build environment. If your software needs newer instructions using avx512 as a constraint will probably work, but limit the pool of nodes your job can run on. Either way, you will want to control where your jobs run with job constraints . Warning Grace's login nodes have newer architecture than the oldest nodes on the cluster. Always compile in an interactive job submitted with the --constraint oldest Slurm flag if you want to ensure your program will run on all generations of the compute nodes. Conventions Local Install Because you don't have admin/root/sudo privileges on the clusters, you won't be able to use sudo and a package manager like apt , yum , etc.; You will need to adapt install instructions to allow for what is called a local or user install. If you prefer or require this method, you should create a container image (see our Apptainer guide ), then run it on the cluster. For things to work smoothly you will need to choose and stick with a prefix, or path to your installed applications and libraries. We recommend this be either in your home or project directory, something like ~/software or /path/to/project/software . Make sure you have created it before continuing. Tip If you choose a project directory prefix, it will be easier to share your applications with lab mates or other cluster users. Just make sure to use the true path (the one returned by mydirectories ). Once you've chosen a prefix you will want to add any directory with executables you want to run to your PATH environment variable, and any directores with libraries that your application(s) link to your LD_LIBRARY_PATH environment variable. Each of these tell your shell where to look when you call your application without specifying an absolute path to it. To set these variables permanently, add the following to the end of your ~/.bashrc file: # local installs export MY_PREFIX = ~/software export PATH = $MY_PREFIX /bin: $PATH export LD_LIBRARY_PATH = $MY_PREFIX /lib: $LD_LIBRARY_PATH For the remainder of the guide we'll use the $MY_PREFIX variable to refer to the prefix. See below or your application's install instructions for exactly how to specify your prefix at build/install time. Dependencies You will need to develop a build strategy that works for you and stay consistent. If you're happy using libraries and toolchains that are already available on the cluster as dependencies (recommended), feel free to create module collections that serve as your environments. If you prefer to completely build your own software tree, that is ok too. Whichever route you choose, try to stick with the same version of dependencies (e.g. MPI, zlib, numpy) and compiler you're using (e.g. GCC, intel). We find that unless absolutely necessary, the newest version of a compiler or library might not be the most compatible with a wide array of scientific software so you may want to step back a few versions or try using what was available at the time your application was being developed. Autotools ( configure / make ) If your application includes instructions to run ./bootstrap , ./autogen.sh , ./configure or make , it is using the GNU Build System . Warning If you are using GCC 10+, you will need to load a separate Autotools module for your version of GCC; e.g., module load Autotools/20200321-GCCcore-10.2.0 configure If you are instructed to run ./configure to generate a Makefile, specify your prefix with the --prefix option. This creates a file, usually named Makefile that is a recipe for make to use to build your application. export MY_PREFIX = ~/software ./configure --prefix = $MY_PREFIX make install If your configure ran properly, make install should properly place your application in your prefix directory. If there is no install target specified for your application, you can either run make and copy the application to your $MY_PREFIX/bin directory or build it somewhere in $MY_PREFIX and add its relevant paths to your PATH and/or LD_LIBRARY_PATH environment variables in your ~/.bashrc file as shown in the local install section. CMake CMake is a popular cross-platform build system. On a linux system, CMake will create a Makefile in a step analogous to ./configure . It is common to create a build directory then run the cmake and make commands from there. Below is what installing to your $MY_DIRECTORY prefix might look like with CMake. CMake instructions also tend to link together the build process onto on line with && , which tells your shell to only continue to the next command if the previous one exited without error. export MY_PREFIX = ~/software mkdir build && cd build && cmake -DCMAKE_INSTALL_PREFIX = $MY_PREFIX .. && make && make install","title":"Build Software"},{"location":"clusters-at-yale/applications/compile/#build-software","text":"How to get software you need up and running on the clusters.","title":"Build Software"},{"location":"clusters-at-yale/applications/compile/#caveat-emptor","text":"We recommend either use existing software modules , Conda , Apptainer , or pre-compiled software where available. However, there are cases where compiling applications is necessary or desired. This can be because the pre-compiled version isn't readily available/compatible or because compiling applications on the cluster will make an appreciable difference in performance. It is also the case that many R packages are compiled at install time. When compiling applications on the clusters, it is important to consider the ways in which you expect to run the application you are endeavoring to get working. If you want to be able to run jobs calling your application any node on the cluster, you will need to target the oldest hardware available so that newer optimizations are not used that will fail on some nodes. If your application is already quite specialized (e.g. needs GPUs or brand-new CPU instructions), you will want to compile it natively for the subset of compute nodes on which your jobs will run. This decision is often a trade-off between faster individual jobs or jobs that can run on more nodes at once. Each of the cluster pages (see the HPC Resources page for a list) has a \"Compute Node Configurations\" section where nodes are roughly listed from oldest to newest.","title":"caveat emptor"},{"location":"clusters-at-yale/applications/compile/#illegal-instruction-instructions","text":"You may find that software compiled on newer compute nodes will fail with the error Illegal instruction (core dumped) . This includes R/Python libraries that include code that compiles from source. To remedy this issue make sure to always either: Build or install software on the oldest available nodes. You can ensure you are on the oldest hardware by specifying the oldest feature ( --constraint oldest ) in your job submission. Require that your jobs running the software in question request similar hardware to their build environment. If your software needs newer instructions using avx512 as a constraint will probably work, but limit the pool of nodes your job can run on. Either way, you will want to control where your jobs run with job constraints . Warning Grace's login nodes have newer architecture than the oldest nodes on the cluster. Always compile in an interactive job submitted with the --constraint oldest Slurm flag if you want to ensure your program will run on all generations of the compute nodes.","title":"Illegal Instruction Instructions"},{"location":"clusters-at-yale/applications/compile/#conventions","text":"","title":"Conventions"},{"location":"clusters-at-yale/applications/compile/#local-install","text":"Because you don't have admin/root/sudo privileges on the clusters, you won't be able to use sudo and a package manager like apt , yum , etc.; You will need to adapt install instructions to allow for what is called a local or user install. If you prefer or require this method, you should create a container image (see our Apptainer guide ), then run it on the cluster. For things to work smoothly you will need to choose and stick with a prefix, or path to your installed applications and libraries. We recommend this be either in your home or project directory, something like ~/software or /path/to/project/software . Make sure you have created it before continuing. Tip If you choose a project directory prefix, it will be easier to share your applications with lab mates or other cluster users. Just make sure to use the true path (the one returned by mydirectories ). Once you've chosen a prefix you will want to add any directory with executables you want to run to your PATH environment variable, and any directores with libraries that your application(s) link to your LD_LIBRARY_PATH environment variable. Each of these tell your shell where to look when you call your application without specifying an absolute path to it. To set these variables permanently, add the following to the end of your ~/.bashrc file: # local installs export MY_PREFIX = ~/software export PATH = $MY_PREFIX /bin: $PATH export LD_LIBRARY_PATH = $MY_PREFIX /lib: $LD_LIBRARY_PATH For the remainder of the guide we'll use the $MY_PREFIX variable to refer to the prefix. See below or your application's install instructions for exactly how to specify your prefix at build/install time.","title":"Local Install"},{"location":"clusters-at-yale/applications/compile/#dependencies","text":"You will need to develop a build strategy that works for you and stay consistent. If you're happy using libraries and toolchains that are already available on the cluster as dependencies (recommended), feel free to create module collections that serve as your environments. If you prefer to completely build your own software tree, that is ok too. Whichever route you choose, try to stick with the same version of dependencies (e.g. MPI, zlib, numpy) and compiler you're using (e.g. GCC, intel). We find that unless absolutely necessary, the newest version of a compiler or library might not be the most compatible with a wide array of scientific software so you may want to step back a few versions or try using what was available at the time your application was being developed.","title":"Dependencies"},{"location":"clusters-at-yale/applications/compile/#autotools-configuremake","text":"If your application includes instructions to run ./bootstrap , ./autogen.sh , ./configure or make , it is using the GNU Build System . Warning If you are using GCC 10+, you will need to load a separate Autotools module for your version of GCC; e.g., module load Autotools/20200321-GCCcore-10.2.0","title":"Autotools (configure/make)"},{"location":"clusters-at-yale/applications/compile/#configure","text":"If you are instructed to run ./configure to generate a Makefile, specify your prefix with the --prefix option. This creates a file, usually named Makefile that is a recipe for make to use to build your application. export MY_PREFIX = ~/software ./configure --prefix = $MY_PREFIX","title":"configure"},{"location":"clusters-at-yale/applications/compile/#make-install","text":"If your configure ran properly, make install should properly place your application in your prefix directory. If there is no install target specified for your application, you can either run make and copy the application to your $MY_PREFIX/bin directory or build it somewhere in $MY_PREFIX and add its relevant paths to your PATH and/or LD_LIBRARY_PATH environment variables in your ~/.bashrc file as shown in the local install section.","title":"make install"},{"location":"clusters-at-yale/applications/compile/#cmake","text":"CMake is a popular cross-platform build system. On a linux system, CMake will create a Makefile in a step analogous to ./configure . It is common to create a build directory then run the cmake and make commands from there. Below is what installing to your $MY_DIRECTORY prefix might look like with CMake. CMake instructions also tend to link together the build process onto on line with && , which tells your shell to only continue to the next command if the previous one exited without error. export MY_PREFIX = ~/software mkdir build && cd build && cmake -DCMAKE_INSTALL_PREFIX = $MY_PREFIX .. && make && make install","title":"CMake"},{"location":"clusters-at-yale/applications/lifecycle/","text":"Software Module Lifecycle To keep the YCRC cluster software modules catalogs tidy, relevant, and up to date, we periodically deprecate and introduce modules. Deprecated Modules The two major criteria we use to decide which modules to deprecate are: A software module has not been used much in the past year We are ending support for the toolchain with which a module was built As we deprecate modules, every time you load a module that has been marked for removal a warning message will appear. The message state when the module will no appear in the module list. If you see such a message, we recommend you update your project to use a supported module as soon as possible or contacting us for help. Toolchain Support The YCRC maintains a rolling two toolchain version support model. At any given time on a cluster, we aim to support two versions of each of the major toolchains, foss and intel . The two versions are separated by two years and new software is typically installed with the later version. When we introduce a new toolchain version, we phase out support for the oldest by marking software in that toolchain for deprecation. A few months later, software in the oldest toolchain version will be removed from the module list and no longer supported by the YCRC.","title":"Module Lifecycle"},{"location":"clusters-at-yale/applications/lifecycle/#software-module-lifecycle","text":"To keep the YCRC cluster software modules catalogs tidy, relevant, and up to date, we periodically deprecate and introduce modules.","title":"Software Module Lifecycle"},{"location":"clusters-at-yale/applications/lifecycle/#deprecated-modules","text":"The two major criteria we use to decide which modules to deprecate are: A software module has not been used much in the past year We are ending support for the toolchain with which a module was built As we deprecate modules, every time you load a module that has been marked for removal a warning message will appear. The message state when the module will no appear in the module list. If you see such a message, we recommend you update your project to use a supported module as soon as possible or contacting us for help.","title":"Deprecated Modules"},{"location":"clusters-at-yale/applications/lifecycle/#toolchain-support","text":"The YCRC maintains a rolling two toolchain version support model. At any given time on a cluster, we aim to support two versions of each of the major toolchains, foss and intel . The two versions are separated by two years and new software is typically installed with the later version. When we introduce a new toolchain version, we phase out support for the oldest by marking software in that toolchain for deprecation. A few months later, software in the oldest toolchain version will be removed from the module list and no longer supported by the YCRC.","title":"Toolchain Support"},{"location":"clusters-at-yale/applications/modules/","text":"Load Software with Modules To facilitate the diverse work that happens on the YCRC clusters we compile, install, and manage software packages separately from those installed in standard system directories. We use EasyBuild to build, install, and manage packages. You can access these packages as Lmod modules. The modules involving compiled software are arranged into hierarchical toolchains that make dependencies more consistent when you load multiple modules. Warning Avoid loading Python or R modules simultaneously with conda environments. This will almost always break something. Find Modules All Available Modules To list all available modules, run: module avail Search For Modules You can search for modules or extensions with spider and avail . For example, to find and list all Python version 3 modules, run: module avail python/3 To find any module or extension that mentions python in its name or description, use the command: module spider python Get Module Help You can get a brief description of a module and the url to the software's homepage by running: module help modulename/version If you don't find a commonly used software package you require, contact us with a software installation request. Otherwise, check out our installation guides to install it for yourself. Load and Unload Modules Load The module load command modifies your environment so you can use the specified software package(s). This command is case-sensitive to module names. The module load command will load dependencies as needed, you don't need to load them separately. For batch jobs , add module load command(s) to your submission script. For example, to load Python version 3.8.6 and BLAST+ version 2.11.0 , find modules with matching toolchain suffixes and run the command: module load Python/3.8.6-GCCcore-10.2.0 BLAST+/2.11.0-GCCcore-10.2.0 Lmod will add python and the BLAST commands to your environment. Since both of these modules were built with the GCCcore/10.2.0 toolchain module, they will not load conflicting libraries. Recall you can see the other modules that were loaded by running module list . Module Defaults As new versions of software get installed and others are deprecated , the default module version can change over time. It is best practice to note the specific module versions you are using for a project and load those explicitly, e.g. module load Python/3.8.6-GCCcore-10.2.0 not module load Python . This makes your work more reproducible and less likely to change unexpectedly in the future. Unload You can also unload a specific module that you've previously loaded: module unload R Or unload all modules at once with: module purge Purge Lightly module purge will alert you to a sticky module that is always loaded called StdEnv . Avoid unloading StdEnv unless explicitly told to do so, othewise you will lose some important setup for the cluster you are on. Module Collections Save Collections It can be a pain to enter a long list of modules every time you return to a project. Module collections allow you to create sets of modules to load together. This method is particularly useful if you have two or more module sets that may conflict with one another. Save a collection of modules by first loading all the modules you want to save together then run: module save environment_name (replace environment_name with something more meaningful to you) Restore Collections Load a collection with module restore : module restore environment_name To modify a collection: restore it, make the desired changes by load ing and/or unload ing modules, then save it to the same name. List Collections To get a list of your collections, run: module savelist ml : A Convinient Tool Lmod provides a convinient tool called ml to simplify all of the module commands. List Module Loaded ml Load Modules ml Python/3.8.6-GCCcore-10.2.0 Unload Modules ml -Python With moudle Sub-commands ml can be used to replace the module command. It can take all the sub-commands from module and works the same way as module does. ml load Python R ml unload Python ml spider Python ml avail ml whatis Python ml key Python ml purge ml save test ml restore test Environment Variables To refer to the directory where the software from a module is stored, you can use the environment variable $EBROOTMODULENAME where MODULENAME is the name of the module in all caps with no spaces. This can be useful for finding the executables, libraries, or readme files that are included with the software: [ netid@node ~ ] $ module load SAMtools [ netid@node ~ ] $ echo $EBVERSIONSAMTOOLS 1 .11 [ netid@node ~ ] $ ls $EBROOTSAMTOOLS bin easybuild include lib lib64 share [ netid@node ~ ] $ ls $EBROOTSAMTOOLS /bin ace2sam maq2sam-short psl2sam.pl soap2sam.pl blast2sam.pl md5fa r2plot.lua vcfutils.lua bowtie2sam.pl md5sum-lite sam2vcf.pl wgsim export2sam.pl novo2sam.pl samtools wgsim_eval.pl interpolate_sam.pl plot-ampliconstats samtools.pl zoom2sam.pl maq2sam-long plot-bamstats seq_cache_populate.pl Further Reading You can view documentation while on the cluster using the command: man module There is even more information at the offical Lmod website and related documentation .","title":"Software Modules"},{"location":"clusters-at-yale/applications/modules/#load-software-with-modules","text":"To facilitate the diverse work that happens on the YCRC clusters we compile, install, and manage software packages separately from those installed in standard system directories. We use EasyBuild to build, install, and manage packages. You can access these packages as Lmod modules. The modules involving compiled software are arranged into hierarchical toolchains that make dependencies more consistent when you load multiple modules. Warning Avoid loading Python or R modules simultaneously with conda environments. This will almost always break something.","title":"Load Software with Modules"},{"location":"clusters-at-yale/applications/modules/#find-modules","text":"","title":"Find Modules"},{"location":"clusters-at-yale/applications/modules/#all-available-modules","text":"To list all available modules, run: module avail","title":"All Available Modules"},{"location":"clusters-at-yale/applications/modules/#search-for-modules","text":"You can search for modules or extensions with spider and avail . For example, to find and list all Python version 3 modules, run: module avail python/3 To find any module or extension that mentions python in its name or description, use the command: module spider python","title":"Search For Modules"},{"location":"clusters-at-yale/applications/modules/#get-module-help","text":"You can get a brief description of a module and the url to the software's homepage by running: module help modulename/version If you don't find a commonly used software package you require, contact us with a software installation request. Otherwise, check out our installation guides to install it for yourself.","title":"Get Module Help"},{"location":"clusters-at-yale/applications/modules/#load-and-unload-modules","text":"","title":"Load and Unload Modules"},{"location":"clusters-at-yale/applications/modules/#load","text":"The module load command modifies your environment so you can use the specified software package(s). This command is case-sensitive to module names. The module load command will load dependencies as needed, you don't need to load them separately. For batch jobs , add module load command(s) to your submission script. For example, to load Python version 3.8.6 and BLAST+ version 2.11.0 , find modules with matching toolchain suffixes and run the command: module load Python/3.8.6-GCCcore-10.2.0 BLAST+/2.11.0-GCCcore-10.2.0 Lmod will add python and the BLAST commands to your environment. Since both of these modules were built with the GCCcore/10.2.0 toolchain module, they will not load conflicting libraries. Recall you can see the other modules that were loaded by running module list . Module Defaults As new versions of software get installed and others are deprecated , the default module version can change over time. It is best practice to note the specific module versions you are using for a project and load those explicitly, e.g. module load Python/3.8.6-GCCcore-10.2.0 not module load Python . This makes your work more reproducible and less likely to change unexpectedly in the future.","title":"Load"},{"location":"clusters-at-yale/applications/modules/#unload","text":"You can also unload a specific module that you've previously loaded: module unload R Or unload all modules at once with: module purge Purge Lightly module purge will alert you to a sticky module that is always loaded called StdEnv . Avoid unloading StdEnv unless explicitly told to do so, othewise you will lose some important setup for the cluster you are on.","title":"Unload"},{"location":"clusters-at-yale/applications/modules/#module-collections","text":"","title":"Module Collections"},{"location":"clusters-at-yale/applications/modules/#save-collections","text":"It can be a pain to enter a long list of modules every time you return to a project. Module collections allow you to create sets of modules to load together. This method is particularly useful if you have two or more module sets that may conflict with one another. Save a collection of modules by first loading all the modules you want to save together then run: module save environment_name (replace environment_name with something more meaningful to you)","title":"Save Collections"},{"location":"clusters-at-yale/applications/modules/#restore-collections","text":"Load a collection with module restore : module restore environment_name To modify a collection: restore it, make the desired changes by load ing and/or unload ing modules, then save it to the same name.","title":"Restore Collections"},{"location":"clusters-at-yale/applications/modules/#list-collections","text":"To get a list of your collections, run: module savelist","title":"List Collections"},{"location":"clusters-at-yale/applications/modules/#ml-a-convinient-tool","text":"Lmod provides a convinient tool called ml to simplify all of the module commands.","title":"ml: A Convinient Tool"},{"location":"clusters-at-yale/applications/modules/#list-module-loaded","text":"ml","title":"List Module Loaded"},{"location":"clusters-at-yale/applications/modules/#load-modules","text":"ml Python/3.8.6-GCCcore-10.2.0","title":"Load Modules"},{"location":"clusters-at-yale/applications/modules/#unload-modules","text":"ml -Python","title":"Unload Modules"},{"location":"clusters-at-yale/applications/modules/#with-moudle-sub-commands","text":"ml can be used to replace the module command. It can take all the sub-commands from module and works the same way as module does. ml load Python R ml unload Python ml spider Python ml avail ml whatis Python ml key Python ml purge ml save test ml restore test","title":"With moudle Sub-commands"},{"location":"clusters-at-yale/applications/modules/#environment-variables","text":"To refer to the directory where the software from a module is stored, you can use the environment variable $EBROOTMODULENAME where MODULENAME is the name of the module in all caps with no spaces. This can be useful for finding the executables, libraries, or readme files that are included with the software: [ netid@node ~ ] $ module load SAMtools [ netid@node ~ ] $ echo $EBVERSIONSAMTOOLS 1 .11 [ netid@node ~ ] $ ls $EBROOTSAMTOOLS bin easybuild include lib lib64 share [ netid@node ~ ] $ ls $EBROOTSAMTOOLS /bin ace2sam maq2sam-short psl2sam.pl soap2sam.pl blast2sam.pl md5fa r2plot.lua vcfutils.lua bowtie2sam.pl md5sum-lite sam2vcf.pl wgsim export2sam.pl novo2sam.pl samtools wgsim_eval.pl interpolate_sam.pl plot-ampliconstats samtools.pl zoom2sam.pl maq2sam-long plot-bamstats seq_cache_populate.pl","title":"Environment Variables"},{"location":"clusters-at-yale/applications/modules/#further-reading","text":"You can view documentation while on the cluster using the command: man module There is even more information at the offical Lmod website and related documentation .","title":"Further Reading"},{"location":"clusters-at-yale/applications/toolchains/","text":"Software Module Toolchains The YCRC uses a framework called EasyBuild to build and install the software you access via the module system . Toolchains When we install software, we use pre-defined build environment modules called toolchains. These are modules that include dependencies like compilers and libraries such as GCC, OpenMPI, CUDA, etc. We do this to keep our build process simpler, and to ensure that sets of software modules loaded together function properly. The two groups of toolchains we use on the YCRC clusters are foss and intel , which hierarchically include some shared sub-toolchains. Toolchains will have versions associated with the version of the compiler and/or when the toolchain was composed. Toolchain names and versions are appended as suffixes in module names. This tells you that a module was built with that toolchain and which other modules are compatible with it. The YCRC maintains a rolling two toolchain version support model. The toolchain versions supported on each cluster are listed in the Module Lifecycle documentation. Free Open Source Software ( foss ) The foss toolchains are versioned with a yearletter scheme, e.g. foss/2020b is the second foss toolchain composed in 2020. Software modules that were built with a sub-toolchain, e.g. GCCcore , are still safe to load with their parents as long as their versions match. The major difference between foss and fosscuda is that fosscuda includes CUDA and builds applications for GPUs by default. You shoould only use fosscuda modules on nodes with GPUs . Below is a tree depicting which toolchains inherit each other. foss: gompi + FFTW, OpenBLAS, ScaLAPACK \u2514\u2500\u2500 gompi: GCC + OpenMPI \u2514\u2500\u2500 GCC: GCCcore + zlib, binutils \u2514\u2500\u2500 GCCcore: GNU Compiler Collection fosscuda: gompic + FFTW, OpenBLAS, ScaLAPACK \u2514\u2500\u2500 gompic: gcccuda + CUDA-enabled OpenMPI \u2514\u2500\u2500 gcccuda: GCC + CUDA \u2514\u2500\u2500 GCC: GCCcore + zlib, binutils \u2514\u2500\u2500 GCCcore: GNU Compiler Collection Intel The YCRC licenses Intel Parallel Studio XE (Intel oneAPI Base & HPC Toolkit coming soon). The intel and iomkl toolchains are versioned with a yearletter scheme, e.g. intel/2020b is the second intel toolchain composed in 2020. The major difference between iomkl and intel is MPI - intel uses Intel's MPI implementation and iomkl uses OpenMPI. Below is a tree depicting which toolchains inherit each other. iomkl: iompi + Intel Math Kernel Library \u2514\u2500\u2500 iompi: iccifort + OpenMPI \u2514\u2500\u2500 iccifort: Intel compilers \u2514\u2500\u2500 GCCcore: GNU Compiler Collection intel: iimpi + Intel Math Kernel Library \u2514\u2500\u2500 iimpi: iccifort + Intel MPI \u2514\u2500\u2500 iccifort: Intel C/C++/Fortran compilers \u2514\u2500\u2500 GCCcore: GNU Compiler Collection What Versions Match? To see what versions of sub-toolchains are compatible with their parents, load a foss or intel module of interest and run module list . [ netid@node ~ ] $ module load foss/2020b [ netid@node ~ ] $ module list Currently Loaded Modules: 1 ) StdEnv ( S ) 7 ) XZ/5.2.5-GCCcore-10.2.0 13 ) OpenMPI/4.0.5-GCC-10.2.0 2 ) GCCcore/10.2.0 8 ) libxml2/2.9.10-GCCcore-10.2.0 14 ) OpenBLAS/0.3.12-GCC-10.2.0 3 ) zlib/1.2.11-GCCcore-10.2.0 9 ) libpciaccess/0.16-GCCcore-10.2.0 15 ) gompi/2020b 4 ) binutils/2.35-GCCcore-10.2.0 10 ) hwloc/2.2.0-GCCcore-10.2.0 16 ) FFTW/3.3.8-gompi-2020b 5 ) GCC/10.2.0 11 ) UCX/1.9.0-GCCcore-10.2.0 17 ) ScaLAPACK/2.1.0-gompi-2020b 6 ) numactl/2.0.13-GCCcore-10.2.0 12 ) libfabric/1.11.0-GCCcore-10.2.0 18 ) foss/2020b Where: S: Module is Sticky, requires --force to unload or purge Here you see that foss/2020b includes GCCcore/10.2.0 , so modules with either the foss-2020b or GCCcore-10.2.0 should be compatible.","title":"Module Toolchains"},{"location":"clusters-at-yale/applications/toolchains/#software-module-toolchains","text":"The YCRC uses a framework called EasyBuild to build and install the software you access via the module system .","title":"Software Module Toolchains"},{"location":"clusters-at-yale/applications/toolchains/#toolchains","text":"When we install software, we use pre-defined build environment modules called toolchains. These are modules that include dependencies like compilers and libraries such as GCC, OpenMPI, CUDA, etc. We do this to keep our build process simpler, and to ensure that sets of software modules loaded together function properly. The two groups of toolchains we use on the YCRC clusters are foss and intel , which hierarchically include some shared sub-toolchains. Toolchains will have versions associated with the version of the compiler and/or when the toolchain was composed. Toolchain names and versions are appended as suffixes in module names. This tells you that a module was built with that toolchain and which other modules are compatible with it. The YCRC maintains a rolling two toolchain version support model. The toolchain versions supported on each cluster are listed in the Module Lifecycle documentation.","title":"Toolchains"},{"location":"clusters-at-yale/applications/toolchains/#free-open-source-software-foss","text":"The foss toolchains are versioned with a yearletter scheme, e.g. foss/2020b is the second foss toolchain composed in 2020. Software modules that were built with a sub-toolchain, e.g. GCCcore , are still safe to load with their parents as long as their versions match. The major difference between foss and fosscuda is that fosscuda includes CUDA and builds applications for GPUs by default. You shoould only use fosscuda modules on nodes with GPUs . Below is a tree depicting which toolchains inherit each other. foss: gompi + FFTW, OpenBLAS, ScaLAPACK \u2514\u2500\u2500 gompi: GCC + OpenMPI \u2514\u2500\u2500 GCC: GCCcore + zlib, binutils \u2514\u2500\u2500 GCCcore: GNU Compiler Collection fosscuda: gompic + FFTW, OpenBLAS, ScaLAPACK \u2514\u2500\u2500 gompic: gcccuda + CUDA-enabled OpenMPI \u2514\u2500\u2500 gcccuda: GCC + CUDA \u2514\u2500\u2500 GCC: GCCcore + zlib, binutils \u2514\u2500\u2500 GCCcore: GNU Compiler Collection","title":"Free Open Source Software (foss)"},{"location":"clusters-at-yale/applications/toolchains/#intel","text":"The YCRC licenses Intel Parallel Studio XE (Intel oneAPI Base & HPC Toolkit coming soon). The intel and iomkl toolchains are versioned with a yearletter scheme, e.g. intel/2020b is the second intel toolchain composed in 2020. The major difference between iomkl and intel is MPI - intel uses Intel's MPI implementation and iomkl uses OpenMPI. Below is a tree depicting which toolchains inherit each other. iomkl: iompi + Intel Math Kernel Library \u2514\u2500\u2500 iompi: iccifort + OpenMPI \u2514\u2500\u2500 iccifort: Intel compilers \u2514\u2500\u2500 GCCcore: GNU Compiler Collection intel: iimpi + Intel Math Kernel Library \u2514\u2500\u2500 iimpi: iccifort + Intel MPI \u2514\u2500\u2500 iccifort: Intel C/C++/Fortran compilers \u2514\u2500\u2500 GCCcore: GNU Compiler Collection","title":"Intel"},{"location":"clusters-at-yale/applications/toolchains/#what-versions-match","text":"To see what versions of sub-toolchains are compatible with their parents, load a foss or intel module of interest and run module list . [ netid@node ~ ] $ module load foss/2020b [ netid@node ~ ] $ module list Currently Loaded Modules: 1 ) StdEnv ( S ) 7 ) XZ/5.2.5-GCCcore-10.2.0 13 ) OpenMPI/4.0.5-GCC-10.2.0 2 ) GCCcore/10.2.0 8 ) libxml2/2.9.10-GCCcore-10.2.0 14 ) OpenBLAS/0.3.12-GCC-10.2.0 3 ) zlib/1.2.11-GCCcore-10.2.0 9 ) libpciaccess/0.16-GCCcore-10.2.0 15 ) gompi/2020b 4 ) binutils/2.35-GCCcore-10.2.0 10 ) hwloc/2.2.0-GCCcore-10.2.0 16 ) FFTW/3.3.8-gompi-2020b 5 ) GCC/10.2.0 11 ) UCX/1.9.0-GCCcore-10.2.0 17 ) ScaLAPACK/2.1.0-gompi-2020b 6 ) numactl/2.0.13-GCCcore-10.2.0 12 ) libfabric/1.11.0-GCCcore-10.2.0 18 ) foss/2020b Where: S: Module is Sticky, requires --force to unload or purge Here you see that foss/2020b includes GCCcore/10.2.0 , so modules with either the foss-2020b or GCCcore-10.2.0 should be compatible.","title":"What Versions Match?"},{"location":"clusters-at-yale/guides/","text":"Guides to Software & Tools The YCRC installs and manage commonly used software. These software are available as modules, which allow you to add or remove different combinations and versions of software to your environment as needed. See our software module guide for more information. To see all pre-installed software, you can run module avail on a cluster to page through all available software. For certain software packages, we provide guides for running on our clusters. If you have tips for running a commonly used software and would like to contribute them to our Software Guides, contact us or submit a pull request on the docs repo . Additional Guides For additional guides and tutorials, see our catalog of recommended online tutorials on Python, R, unix commands and more .","title":"Overview"},{"location":"clusters-at-yale/guides/#guides-to-software-tools","text":"The YCRC installs and manage commonly used software. These software are available as modules, which allow you to add or remove different combinations and versions of software to your environment as needed. See our software module guide for more information. To see all pre-installed software, you can run module avail on a cluster to page through all available software. For certain software packages, we provide guides for running on our clusters. If you have tips for running a commonly used software and would like to contribute them to our Software Guides, contact us or submit a pull request on the docs repo .","title":"Guides to Software & Tools"},{"location":"clusters-at-yale/guides/#additional-guides","text":"For additional guides and tutorials, see our catalog of recommended online tutorials on Python, R, unix commands and more .","title":"Additional Guides"},{"location":"clusters-at-yale/guides/cesm/","text":"CESM/CAM This is a quick start guide for CESM at Yale. You will still need to read the CESM User Guide and work with your fellow research group members to design and run your simulations, but this guide covers the basics that are specific to running CESM at Yale. CESM User Guides CESM1.0.4 User\u2019s Guide CESM1.1.z User\u2019s Guide CESM User\u2019s Guide (CESM1.2 Release Series User\u2019s Guide) (PDF) Modules CESM 1.0.4, 1.2.2, 2.x are available on Grace. For CESM 2.1.0, load the following modules module load CESM/2.1.0-iomkl-2018a For older versions of CESM, you will need to use the old modules. These old version of CESM do not work with the new modules module use /vast/palmer/apps/old.grace/Modules module avail CESM Once you have located your module, run module load with the module name from above. With either module, the module will configure your environment with the Intel compiler, OpenMPI and NetCDF libraries as well as set the location of the Yale\u2019s repository of CESM input data. If you will be primarily using CESM, you can avoid rerunning the module load command every time you login by saving it to your default environment: module load module save Input Data To reduce the amount of data duplication on the cluster, we keep one centralized repository of CESM input data. The YCRC staff are only people who can add to that directory. If your build fails due to missing inputdata, contact us with your create_newcase line and we will download that data for you. Run CESM CESM needs to be rebuilt separately for each run. As a result, running CESM is more complicated than a standard piece of software where you would just run the executable. Create Your Case Each simulation is called a \u201ccase\u201d. Loading a CESM module will put the create_newcase script in your path, so you can call it as follows. This will create a directory with your case name, that we will refer to as $CASE through out the rest of the guide. create_newcase -case $CASE -compset = -res = -mach = cd $CASE The mach parameters for Grace is yalegrace for CESM 1.0.4 and gracempi for CESM 1.2.2 and CESM 2.x , respectively. For example create_newcase --case $CASE --compset = B1850 --res = f09_g17 --mach = gracempi cd $CASE Setup Your Case If you are making any changes to the namelist files (such as increasing the duration of the simulation), do those before running the setup scripts below. CESM 1.0.X ./configure -case CESM 1.1.X and CESM 1.2.X ./cesm_setup CESM 2.X ./case.setup Build Your Case After you run the setup script, there will be a set of the scripts in your case directory that start with your case name. To compile your simulation executable, first move to an interactive job and then run the build script corresponding to your case. # CESM 1.x salloc -c 4 module load # = the appropriate module for your CESM version ./ $CASE . $mach .build # CESM 2.x salloc -c 4 module load # = the appropriate module for your CESM version ./case.build --skip-provenance-check Note the --skip-provenance-check flag is required with CESM 2.x due to the changes made to port the code to Grace. For more details on interactive jobs, see our Slurm documentation . During the build, CESM will create a corresponding directory in your scratch60 or project directory at ls ~/scratch60/CESM/$CASE This directory will contain all the outputs from your simulation as well as logs and the cesm.exe executable. Common Build Issues Make sure you compile on an interactive node as described above. If you build fails, it will direct you to look in a bldlog file. If that log complains that it can\u2019t find mpirun, NetCDF or another library or executable, make sure you have the correct CESM module loaded. It can helpful to run module purge before the module load to ensure a reproducible environment. If you get an error saying ERROR: Error gathering provenance information from manage_externals , rerun the build using the suggested flag, e.g. ./case.build --skip-provenance-check . Submit Your Case Once the build is complete, which can take 5-15 minutes, you can submit your case with the submit script. # CESM 1.x ./ $CASE . $mach .submit # CESM 2.x ./case.submit For more details on monitoring your submitted jobs, see our Slurm documentation . Changing Slurm Partition In CESM 2.x, to change the partition in which your main jobs will run, use the following command: ./xmlchange JOB_QUEUE = scavenge --subgroup case .run The associated archive job will still be submitted to the day partition. Troubleshoot Your Run If your run doesn\u2019t complete, there are a few places to look to identify the error. CESM writes to multiple log files for the different components and you will likely have to look in a few to find the root cause of your error. Slurm Log In your case directory, there will be a file that looks like slurm-.log . Check that file first to make sure the job started up properly. If the last few lines in the file redirect you to look at cpl.log. file in your scratch directory, see below. If there is another error, try to address it and resubmit. CESM Run Logs If the last few lines of the slurm log direct you to look at cpl.log. file, change directory to your case \u201crun\u201d directory (usually in your project directory): cd ~/project/CESM/ $CASE /run The pointer to the cpl file is often misleading as I have found the error is usually located in one of the other logs. Instead look in the cesm.log.xxxxxx file. Towards the end there may be an error or it may signify which component was running. Then look in the log corresponding to that component to track down the issue. One shortcut to finding the relevant logs is to sort the log files by the time to see which ones were last updated: ls -ltr *log* Look at the end of the last couple logs listed and look for an indication of the error. Resolve Errors Once you have identified the lines in the logs corresponding to your error: If your log says something like Disk quota exceeded , your group is out of space in the fileset you are writing to. You can run the getquota script to get details on your disk usage. Your group will need to reduce their usage before you will be able to run successfully. If it looks like a model error and you don\u2019t know how to fix it, we strongly recommend Googling your error and/or looking in the CESM forums . If you are still experiencing issues, contact us . Alternative Submission Parameters By default, the submission script will submit to the \"mpi\" partition for 1 day. CESM 1.x To change this in CESM 1.x, edit your case\u2019s run script and change the partition and time. The maximum walltime in the mpi and scavenge partitions is 24 hours. For example: ## scavenge partition #SBATCH --partition=scavenge #SBATCH --time=1- CESM 2.x To change this in CESM 2.x, use ./xmlchange in your run directory. # Change partition to scavenge ./xmlchange JOB_QUEUE=scavenge # Change walltime limit to 2 days (> 24 hours is only available on PI partitions) ./xmlchange JOB_WALLCLOCK_TIME 2-00:00:00 Further Reading We recommend referencing the User Guides listed at the top of this page. CESM User Forum Our Slurm Documentation CESM is a very widely used package, you can often find answers by simply using Google. Just make sure that the solutions you find correspond to the approximate version of CESM you are using. CESM changes in subtle but significant ways between versions.","title":"CESM/CAM"},{"location":"clusters-at-yale/guides/cesm/#cesmcam","text":"This is a quick start guide for CESM at Yale. You will still need to read the CESM User Guide and work with your fellow research group members to design and run your simulations, but this guide covers the basics that are specific to running CESM at Yale.","title":"CESM/CAM"},{"location":"clusters-at-yale/guides/cesm/#cesm-user-guides","text":"CESM1.0.4 User\u2019s Guide CESM1.1.z User\u2019s Guide CESM User\u2019s Guide (CESM1.2 Release Series User\u2019s Guide) (PDF)","title":"CESM User Guides"},{"location":"clusters-at-yale/guides/cesm/#modules","text":"CESM 1.0.4, 1.2.2, 2.x are available on Grace. For CESM 2.1.0, load the following modules module load CESM/2.1.0-iomkl-2018a For older versions of CESM, you will need to use the old modules. These old version of CESM do not work with the new modules module use /vast/palmer/apps/old.grace/Modules module avail CESM Once you have located your module, run module load with the module name from above. With either module, the module will configure your environment with the Intel compiler, OpenMPI and NetCDF libraries as well as set the location of the Yale\u2019s repository of CESM input data. If you will be primarily using CESM, you can avoid rerunning the module load command every time you login by saving it to your default environment: module load module save","title":"Modules"},{"location":"clusters-at-yale/guides/cesm/#input-data","text":"To reduce the amount of data duplication on the cluster, we keep one centralized repository of CESM input data. The YCRC staff are only people who can add to that directory. If your build fails due to missing inputdata, contact us with your create_newcase line and we will download that data for you.","title":"Input Data"},{"location":"clusters-at-yale/guides/cesm/#run-cesm","text":"CESM needs to be rebuilt separately for each run. As a result, running CESM is more complicated than a standard piece of software where you would just run the executable.","title":"Run CESM"},{"location":"clusters-at-yale/guides/cesm/#create-your-case","text":"Each simulation is called a \u201ccase\u201d. Loading a CESM module will put the create_newcase script in your path, so you can call it as follows. This will create a directory with your case name, that we will refer to as $CASE through out the rest of the guide. create_newcase -case $CASE -compset = -res = -mach = cd $CASE The mach parameters for Grace is yalegrace for CESM 1.0.4 and gracempi for CESM 1.2.2 and CESM 2.x , respectively. For example create_newcase --case $CASE --compset = B1850 --res = f09_g17 --mach = gracempi cd $CASE","title":"Create Your Case"},{"location":"clusters-at-yale/guides/cesm/#setup-your-case","text":"If you are making any changes to the namelist files (such as increasing the duration of the simulation), do those before running the setup scripts below.","title":"Setup Your Case"},{"location":"clusters-at-yale/guides/cesm/#cesm-10x","text":"./configure -case","title":"CESM 1.0.X"},{"location":"clusters-at-yale/guides/cesm/#cesm-11x-and-cesm-12x","text":"./cesm_setup","title":"CESM 1.1.X and CESM 1.2.X"},{"location":"clusters-at-yale/guides/cesm/#cesm-2x","text":"./case.setup","title":"CESM 2.X"},{"location":"clusters-at-yale/guides/cesm/#build-your-case","text":"After you run the setup script, there will be a set of the scripts in your case directory that start with your case name. To compile your simulation executable, first move to an interactive job and then run the build script corresponding to your case. # CESM 1.x salloc -c 4 module load # = the appropriate module for your CESM version ./ $CASE . $mach .build # CESM 2.x salloc -c 4 module load # = the appropriate module for your CESM version ./case.build --skip-provenance-check Note the --skip-provenance-check flag is required with CESM 2.x due to the changes made to port the code to Grace. For more details on interactive jobs, see our Slurm documentation . During the build, CESM will create a corresponding directory in your scratch60 or project directory at ls ~/scratch60/CESM/$CASE This directory will contain all the outputs from your simulation as well as logs and the cesm.exe executable.","title":"Build Your Case"},{"location":"clusters-at-yale/guides/cesm/#common-build-issues","text":"Make sure you compile on an interactive node as described above. If you build fails, it will direct you to look in a bldlog file. If that log complains that it can\u2019t find mpirun, NetCDF or another library or executable, make sure you have the correct CESM module loaded. It can helpful to run module purge before the module load to ensure a reproducible environment. If you get an error saying ERROR: Error gathering provenance information from manage_externals , rerun the build using the suggested flag, e.g. ./case.build --skip-provenance-check .","title":"Common Build Issues"},{"location":"clusters-at-yale/guides/cesm/#submit-your-case","text":"Once the build is complete, which can take 5-15 minutes, you can submit your case with the submit script. # CESM 1.x ./ $CASE . $mach .submit # CESM 2.x ./case.submit For more details on monitoring your submitted jobs, see our Slurm documentation .","title":"Submit Your Case"},{"location":"clusters-at-yale/guides/cesm/#changing-slurm-partition","text":"In CESM 2.x, to change the partition in which your main jobs will run, use the following command: ./xmlchange JOB_QUEUE = scavenge --subgroup case .run The associated archive job will still be submitted to the day partition.","title":"Changing Slurm Partition"},{"location":"clusters-at-yale/guides/cesm/#troubleshoot-your-run","text":"If your run doesn\u2019t complete, there are a few places to look to identify the error. CESM writes to multiple log files for the different components and you will likely have to look in a few to find the root cause of your error.","title":"Troubleshoot Your Run"},{"location":"clusters-at-yale/guides/cesm/#slurm-log","text":"In your case directory, there will be a file that looks like slurm-.log . Check that file first to make sure the job started up properly. If the last few lines in the file redirect you to look at cpl.log. file in your scratch directory, see below. If there is another error, try to address it and resubmit.","title":"Slurm Log"},{"location":"clusters-at-yale/guides/cesm/#cesm-run-logs","text":"If the last few lines of the slurm log direct you to look at cpl.log. file, change directory to your case \u201crun\u201d directory (usually in your project directory): cd ~/project/CESM/ $CASE /run The pointer to the cpl file is often misleading as I have found the error is usually located in one of the other logs. Instead look in the cesm.log.xxxxxx file. Towards the end there may be an error or it may signify which component was running. Then look in the log corresponding to that component to track down the issue. One shortcut to finding the relevant logs is to sort the log files by the time to see which ones were last updated: ls -ltr *log* Look at the end of the last couple logs listed and look for an indication of the error.","title":"CESM Run Logs"},{"location":"clusters-at-yale/guides/cesm/#resolve-errors","text":"Once you have identified the lines in the logs corresponding to your error: If your log says something like Disk quota exceeded , your group is out of space in the fileset you are writing to. You can run the getquota script to get details on your disk usage. Your group will need to reduce their usage before you will be able to run successfully. If it looks like a model error and you don\u2019t know how to fix it, we strongly recommend Googling your error and/or looking in the CESM forums . If you are still experiencing issues, contact us .","title":"Resolve Errors"},{"location":"clusters-at-yale/guides/cesm/#alternative-submission-parameters","text":"By default, the submission script will submit to the \"mpi\" partition for 1 day.","title":"Alternative Submission Parameters"},{"location":"clusters-at-yale/guides/cesm/#cesm-1x","text":"To change this in CESM 1.x, edit your case\u2019s run script and change the partition and time. The maximum walltime in the mpi and scavenge partitions is 24 hours. For example: ## scavenge partition #SBATCH --partition=scavenge #SBATCH --time=1-","title":"CESM 1.x"},{"location":"clusters-at-yale/guides/cesm/#cesm-2x_1","text":"To change this in CESM 2.x, use ./xmlchange in your run directory. # Change partition to scavenge ./xmlchange JOB_QUEUE=scavenge # Change walltime limit to 2 days (> 24 hours is only available on PI partitions) ./xmlchange JOB_WALLCLOCK_TIME 2-00:00:00","title":"CESM 2.x"},{"location":"clusters-at-yale/guides/cesm/#further-reading","text":"We recommend referencing the User Guides listed at the top of this page. CESM User Forum Our Slurm Documentation CESM is a very widely used package, you can often find answers by simply using Google. Just make sure that the solutions you find correspond to the approximate version of CESM you are using. CESM changes in subtle but significant ways between versions.","title":"Further Reading"},{"location":"clusters-at-yale/guides/checkpointing/","text":"Checkpoint Long-running Jobs When working with long-running jobs and work-flows, it becomes very important to establish checkpoints along the way. This will ensure that if your job is interrupted you will be able to restart it without having to go back to the begining of the job. DMTCP \"Distributed Multithreaded Checkpointing\" allows you to easily save the state of your running job and restart it from that point. This can be very useful if your job fails for any number of reasons: it exceeds the time limit, is preempted from scavenge, the compute node crashes, etc. DMTCP does not require any changes to your code or recompilation. It should work on most sequential or multithreaded/multiprocessing programs as is. module load DMTCP Run Your Job Interactively Under DMTCP For this simple example, we'll use this python script count.py import time i = 0 while True : print ( i , flush = True ) i += 1 time . sleep ( 1 ) Run the script interactively using dmtcp_launch : dmtcp_launch -i 5 python3 count.py It will begin printing to the terminal. In the background, DMTCP will be writing a checkpoint file every 5 seconds. Let it count for a while, then kill it with Ctrl + c . If you look in that directory, you'll see several files related to DMTCP. The *.dmtcp file is the checkpoint file. To restart the job from the last checkpoint, do: dmtcp_restart -i 5 *.dmtcp In practice, you'll most likely want to use DMTCP to checkpoint batch jobs, rather than interactive sessions. Checkpoint a Batch Job This script will submit the job under DMTCP's checkpointing. Here we use a more reasonable checkpoint interval of 300 seconds. You will want to experiment to see how long it takes to write your application's checkpoint file, and tune your interval accordingly. #!/bin/bash module load DMTCP dmtcp_launch -i 300 python count.py Then, if the job fails, you can resubmit it with this script: #!/bin/bash module load DMTCP dmtcp_restart -i 300 *.dmtcp Note that we are using wildcards to name the DMTCP file, which will obviously only work correctly if there is only one checkpoint file in the directory. Alternatively you can edit the script each time and explicitly name the correct checkpoint file. Restart a Preempted job Here is an example job script that will start a job running, periodically checkpoint it, and automatically requeue the job if it is preempted: #!/bin/bash #SBATCH -t 30:00 #SBATCH --requeue #SBATCH --open-mode=append #edit following line to put the appropriate module module load DMTCP cnt = ${ SLURM_RESTART_COUNT :- 0 } echo \"SLURM_RESTART_COUNT = $cnt \" dmtcp_coordinator -i 5 --daemon --port 0 --port-file /tmp/port export DMTCP_COORD_PORT = $( 0 ]] ; then echo \"doing restart\" dmtcp_restart -j *.dmtcp else echo \"Failed to restart the job, exit\" ; exit fi Launch the job with sbatch, and watch the numbers appear in the slurm*.out file. Then, simulate preemption by doing: $ scontrol requeue 123456789 Because the script specified --requeue, the job will be returned to pending. Slurm automatically sets a \"Begin Time\" a couple of minutes in the future, so the job will pend until then, at which point it will begin running again, so be patient. This time the script will invoke dmtcp_restart, and will continue from the checkpoint. If you look at the output, you'll see from the numbers that the job backed up to the previous checkpoint and restarted. You can requeue the job several times, and each time it will restart from the last checkpoint. You should be able to adapt this script to your own job by loading any required modules and replacing \"python count.py\" with your program's invocation. This example is much more complicated than our previous examples. Some notes: DMTCP uses a \"controller\" to manage the checkpointing. In the simple example, dmtcp_launch transparently started a controller on the default port 7779. In this case, we explicitly start a \"controller\" on a random port and communicate the port number via an environment variable. This prevents collisions if multiple DMTCP sessions run on the same node. The -j flag to dmtcp_launch tells it to join the existing controller. On initial launch we remove existing checkpoint files. This may not be a good idea in practice. The env var SLURM_RESTART_COUNT is used to determine if this is a restart or not. Parallel Execution with DMTCP DMTCP can checkpoint multithreaded/multiprocess parallel applications. In this example we run NAMD (a molecular dynamics simulation), using 6 threads on 6 cpus. We also restart automatically on preemption, as above. #!/bin/bash #SBATCH -c 6 #SBATCH -t 30:00 #SBATCH --requeue #SBATCH --open-mode=append #SBATCH -C haswell #edit following line to put the appropriate module module load NAMD/2.12-multicore module load DMTCP cnt = ${ SLURM_RESTART_COUNT :- 0 } echo \"SLURM_RESTARTCOUNT = $cnt \" dmtcp_coordinator -i 90 --daemon --port 0 --port-file /tmp/port export DMTCP_COORD_HOST = ` hostname ` export DMTCP_COORD_PORT = $( 0 ]] ; then echo \"doing restart\" dmtcp_restart *.dmtcp else echo \"Failed to restart the job, exit\" ; exit fi Additional notes dmtcp reopens files when recovering from checkpoints, so most file writes should just work. However, when requeuing jobs as shown above, you should take care to do #SBATCH --open-mode=append keep in mind that recovery from checkpoints does imply backing up to the point of the previous checkpoint. If your program is continuously writing output, the output since the last checkpoint will be replicated. For many programs (like NAMD) the output is really just logging, so this is not a problem. by default, dmtcp compresses checkpoint files. For large files this can take a long time. You can turn off comporession with dmtcp_launch --no-gzip . dmtcp creates a convenience restart script called restart_dmtcp_script.sh with every checkpoint. In theory you can simply call it to restart: ./restart_dmtcp_script.sh rather than restart_dmtcp *.dmtcp However, we have found it to be unreliable. Your mileage may vary. The above examples just scratch the surface. For more information: A DMTCP quickstart and documentation A very helpful page at NERSC","title":"Checkpoint Long-running Jobs"},{"location":"clusters-at-yale/guides/checkpointing/#checkpoint-long-running-jobs","text":"When working with long-running jobs and work-flows, it becomes very important to establish checkpoints along the way. This will ensure that if your job is interrupted you will be able to restart it without having to go back to the begining of the job. DMTCP \"Distributed Multithreaded Checkpointing\" allows you to easily save the state of your running job and restart it from that point. This can be very useful if your job fails for any number of reasons: it exceeds the time limit, is preempted from scavenge, the compute node crashes, etc. DMTCP does not require any changes to your code or recompilation. It should work on most sequential or multithreaded/multiprocessing programs as is. module load DMTCP","title":"Checkpoint Long-running Jobs"},{"location":"clusters-at-yale/guides/checkpointing/#run-your-job-interactively-under-dmtcp","text":"For this simple example, we'll use this python script count.py import time i = 0 while True : print ( i , flush = True ) i += 1 time . sleep ( 1 ) Run the script interactively using dmtcp_launch : dmtcp_launch -i 5 python3 count.py It will begin printing to the terminal. In the background, DMTCP will be writing a checkpoint file every 5 seconds. Let it count for a while, then kill it with Ctrl + c . If you look in that directory, you'll see several files related to DMTCP. The *.dmtcp file is the checkpoint file. To restart the job from the last checkpoint, do: dmtcp_restart -i 5 *.dmtcp In practice, you'll most likely want to use DMTCP to checkpoint batch jobs, rather than interactive sessions.","title":"Run Your Job Interactively Under DMTCP"},{"location":"clusters-at-yale/guides/checkpointing/#checkpoint-a-batch-job","text":"This script will submit the job under DMTCP's checkpointing. Here we use a more reasonable checkpoint interval of 300 seconds. You will want to experiment to see how long it takes to write your application's checkpoint file, and tune your interval accordingly. #!/bin/bash module load DMTCP dmtcp_launch -i 300 python count.py Then, if the job fails, you can resubmit it with this script: #!/bin/bash module load DMTCP dmtcp_restart -i 300 *.dmtcp Note that we are using wildcards to name the DMTCP file, which will obviously only work correctly if there is only one checkpoint file in the directory. Alternatively you can edit the script each time and explicitly name the correct checkpoint file.","title":"Checkpoint a Batch Job"},{"location":"clusters-at-yale/guides/checkpointing/#restart-a-preempted-job","text":"Here is an example job script that will start a job running, periodically checkpoint it, and automatically requeue the job if it is preempted: #!/bin/bash #SBATCH -t 30:00 #SBATCH --requeue #SBATCH --open-mode=append #edit following line to put the appropriate module module load DMTCP cnt = ${ SLURM_RESTART_COUNT :- 0 } echo \"SLURM_RESTART_COUNT = $cnt \" dmtcp_coordinator -i 5 --daemon --port 0 --port-file /tmp/port export DMTCP_COORD_PORT = $( 0 ]] ; then echo \"doing restart\" dmtcp_restart -j *.dmtcp else echo \"Failed to restart the job, exit\" ; exit fi Launch the job with sbatch, and watch the numbers appear in the slurm*.out file. Then, simulate preemption by doing: $ scontrol requeue 123456789 Because the script specified --requeue, the job will be returned to pending. Slurm automatically sets a \"Begin Time\" a couple of minutes in the future, so the job will pend until then, at which point it will begin running again, so be patient. This time the script will invoke dmtcp_restart, and will continue from the checkpoint. If you look at the output, you'll see from the numbers that the job backed up to the previous checkpoint and restarted. You can requeue the job several times, and each time it will restart from the last checkpoint. You should be able to adapt this script to your own job by loading any required modules and replacing \"python count.py\" with your program's invocation. This example is much more complicated than our previous examples. Some notes: DMTCP uses a \"controller\" to manage the checkpointing. In the simple example, dmtcp_launch transparently started a controller on the default port 7779. In this case, we explicitly start a \"controller\" on a random port and communicate the port number via an environment variable. This prevents collisions if multiple DMTCP sessions run on the same node. The -j flag to dmtcp_launch tells it to join the existing controller. On initial launch we remove existing checkpoint files. This may not be a good idea in practice. The env var SLURM_RESTART_COUNT is used to determine if this is a restart or not.","title":"Restart a Preempted job"},{"location":"clusters-at-yale/guides/checkpointing/#parallel-execution-with-dmtcp","text":"DMTCP can checkpoint multithreaded/multiprocess parallel applications. In this example we run NAMD (a molecular dynamics simulation), using 6 threads on 6 cpus. We also restart automatically on preemption, as above. #!/bin/bash #SBATCH -c 6 #SBATCH -t 30:00 #SBATCH --requeue #SBATCH --open-mode=append #SBATCH -C haswell #edit following line to put the appropriate module module load NAMD/2.12-multicore module load DMTCP cnt = ${ SLURM_RESTART_COUNT :- 0 } echo \"SLURM_RESTARTCOUNT = $cnt \" dmtcp_coordinator -i 90 --daemon --port 0 --port-file /tmp/port export DMTCP_COORD_HOST = ` hostname ` export DMTCP_COORD_PORT = $( 0 ]] ; then echo \"doing restart\" dmtcp_restart *.dmtcp else echo \"Failed to restart the job, exit\" ; exit fi","title":"Parallel Execution with DMTCP"},{"location":"clusters-at-yale/guides/checkpointing/#additional-notes","text":"dmtcp reopens files when recovering from checkpoints, so most file writes should just work. However, when requeuing jobs as shown above, you should take care to do #SBATCH --open-mode=append keep in mind that recovery from checkpoints does imply backing up to the point of the previous checkpoint. If your program is continuously writing output, the output since the last checkpoint will be replicated. For many programs (like NAMD) the output is really just logging, so this is not a problem. by default, dmtcp compresses checkpoint files. For large files this can take a long time. You can turn off comporession with dmtcp_launch --no-gzip . dmtcp creates a convenience restart script called restart_dmtcp_script.sh with every checkpoint. In theory you can simply call it to restart: ./restart_dmtcp_script.sh rather than restart_dmtcp *.dmtcp However, we have found it to be unreliable. Your mileage may vary. The above examples just scratch the surface. For more information: A DMTCP quickstart and documentation A very helpful page at NERSC","title":"Additional notes"},{"location":"clusters-at-yale/guides/clustershell/","text":"ClusterShell ClusterShell is a useful Python package for executing arbitrary commands across multiple hosts. On the Yale clusters it provides a relatively simple way for you to run commands on nodes your jobs are running on, and collect the results. The two most useful commands provided are nodeset , which can show and manipulate node lists and clush , which can run commands on multiple nodes at once. Configuration To set up ClusterShell, make sure you have a .config directory and a copy our groups.conf file there. For more info about ClusterShell configuration for Slurm, see the official docs . mkdir -p ~/.config/clustershell wget https://docs.ycrc.yale.edu/_static/files/clustershell_groups.conf -O ~/.config/clustershell/groups.conf We provide ClusterShell as a module, but you can also install it with conda . Module module load ClusterShell Conda module load miniconda conda create -yn clustershell python pip source activate clustershell pip install ClusterShell Examples nodeset The nodeset command uses sinfo underneath but has slightly different syntax. You can use it to ask about node states and nodes your job is running on. The nice difference is you can ask for folded (e.g. c[01-02]n[12,15,18] ) or expanded (e.g. c01n01 c01n02 ... ) node lists. The groups useful to you that we have configured are @user , @job and @state . User group List expanded node names where user abc123 has jobs running # similar to squeue -h -u abc123 -o \"%N\" nodeset -e @user:abc123 Job group List folded nodes where job 1234567 is running # similar to squeue -h -j 1234567 -o \"%N\" nodeset -f @job:1234567 State group List expanded node names that are idle according to slurm # similar to sinfo -t IDLE -o \"%N\" nodeset -e @state:idle clush The clush command uses the node grouping syntax from nodeset to allow you to run commands on those nodes. clush uses ssh to connect to each of these nodes. You can use the -b option to gather output from nodes with same output into the same lines. Leaving this out will report on each node separately. Info You can only ssh to, and therefore run clush on, nodes where you have active jobs. Local storage Get a list of files in /tmp/abs on all nodes where job 654321 is running. clush -bw @job:654321 ls /tmp/abc123 # don't gather identical output clush -w @job:654321 ls /tmp/abc123 CPU usage Show %cpu, memory usage, and command for all nodes running any jobs owned by user abc123 . clush -bw @user:abc123 ps -uabc123 -o%cpu,rss,cmd GPU usage Show what's running on all the GPUs on the nodes associated with your job 654321 . clush -bw @job:654321 nvidia-smi --format = csv --query-compute-apps = process_name,used_gpu_memory","title":"ClusterShell"},{"location":"clusters-at-yale/guides/clustershell/#clustershell","text":"ClusterShell is a useful Python package for executing arbitrary commands across multiple hosts. On the Yale clusters it provides a relatively simple way for you to run commands on nodes your jobs are running on, and collect the results. The two most useful commands provided are nodeset , which can show and manipulate node lists and clush , which can run commands on multiple nodes at once.","title":"ClusterShell"},{"location":"clusters-at-yale/guides/clustershell/#configuration","text":"To set up ClusterShell, make sure you have a .config directory and a copy our groups.conf file there. For more info about ClusterShell configuration for Slurm, see the official docs . mkdir -p ~/.config/clustershell wget https://docs.ycrc.yale.edu/_static/files/clustershell_groups.conf -O ~/.config/clustershell/groups.conf We provide ClusterShell as a module, but you can also install it with conda .","title":"Configuration"},{"location":"clusters-at-yale/guides/clustershell/#module","text":"module load ClusterShell","title":"Module"},{"location":"clusters-at-yale/guides/clustershell/#conda","text":"module load miniconda conda create -yn clustershell python pip source activate clustershell pip install ClusterShell","title":"Conda"},{"location":"clusters-at-yale/guides/clustershell/#examples","text":"","title":"Examples"},{"location":"clusters-at-yale/guides/clustershell/#nodeset","text":"The nodeset command uses sinfo underneath but has slightly different syntax. You can use it to ask about node states and nodes your job is running on. The nice difference is you can ask for folded (e.g. c[01-02]n[12,15,18] ) or expanded (e.g. c01n01 c01n02 ... ) node lists. The groups useful to you that we have configured are @user , @job and @state .","title":"nodeset"},{"location":"clusters-at-yale/guides/clustershell/#user-group","text":"List expanded node names where user abc123 has jobs running # similar to squeue -h -u abc123 -o \"%N\" nodeset -e @user:abc123","title":"User group"},{"location":"clusters-at-yale/guides/clustershell/#job-group","text":"List folded nodes where job 1234567 is running # similar to squeue -h -j 1234567 -o \"%N\" nodeset -f @job:1234567","title":"Job group"},{"location":"clusters-at-yale/guides/clustershell/#state-group","text":"List expanded node names that are idle according to slurm # similar to sinfo -t IDLE -o \"%N\" nodeset -e @state:idle","title":"State group"},{"location":"clusters-at-yale/guides/clustershell/#clush","text":"The clush command uses the node grouping syntax from nodeset to allow you to run commands on those nodes. clush uses ssh to connect to each of these nodes. You can use the -b option to gather output from nodes with same output into the same lines. Leaving this out will report on each node separately. Info You can only ssh to, and therefore run clush on, nodes where you have active jobs.","title":"clush"},{"location":"clusters-at-yale/guides/clustershell/#local-storage","text":"Get a list of files in /tmp/abs on all nodes where job 654321 is running. clush -bw @job:654321 ls /tmp/abc123 # don't gather identical output clush -w @job:654321 ls /tmp/abc123","title":"Local storage"},{"location":"clusters-at-yale/guides/clustershell/#cpu-usage","text":"Show %cpu, memory usage, and command for all nodes running any jobs owned by user abc123 . clush -bw @user:abc123 ps -uabc123 -o%cpu,rss,cmd","title":"CPU usage"},{"location":"clusters-at-yale/guides/clustershell/#gpu-usage","text":"Show what's running on all the GPUs on the nodes associated with your job 654321 . clush -bw @job:654321 nvidia-smi --format = csv --query-compute-apps = process_name,used_gpu_memory","title":"GPU usage"},{"location":"clusters-at-yale/guides/cmd-line-args/","text":"Pass Values into Jobs A useful tool when running jobs on the clusters is to be able to pass variables into a script without modifying any code. This can include specifying the name of a data file to be processed, or setting a variable to a specific value. Generally, there are two ways of achieving this: environment variables and command-line arguments. Here we will work through how to implement these two approaches in both Python and R. Python Environment Variables In python, environment variables are accessed via the os package ( docs page ). In particular, we can use os.getenv to retrieve environment variables set prior to launching the python script. For example, consider a python script designed to process a data file: def file_cruncher ( file_name ): f = open ( file_name ) data = f . read () output = process ( data ) # processing code goes here return output We can use an environment variable ( INPUT_DATA_FILE ) to provide the filename of the data to be processed. The python script ( my_script.py ) is modified to retrieve this variable and analyze the given datafile: import os file_name = os . getenv ( \"INPUT_DATA_FILE\" ) def file_cruncher ( file_name ): f = open ( file_name ) data = f . read () output = process ( data ) # processing code goes here return output To process this data file, you would simply run: export INPUT_DATA_FILE = /path/to/file/input_0.dat python my_script.py This avoids having to modify the python script to change which datafile is processed, we only need to change the environment variable. Command-line Arguments Similarly, one can use command-line arguments to pass values into a script. In python, there are two main packages designed for handling arguments. First is the simple sys.argv function which parses command-line arguments into a list of strings: import sys for a in sys . argv : print ( a ) Running this with a few arguments: $ python my_script.py a b c my_script.py a b c The first element in sys.argv is the name of the script, and then all subsequent arguments follow. Secondly, there is the more fully-featured argparse package ( docs page )which offers many advanced tools to manage command-line arguments. Take a look at their documentation for examples of how to use argparse . R Just as with Python, R provides comparable utilities to access command-line arguments and environment variables. Environment Variables The Sys.getenv utility ( docs page ) works nearly identically to the Python implementation. > Sys.getenv ( 'HOSTNAME' ) [ 1 ] \"grace2.grace.hpc.yale.internal\" Just like Python, these values are always returned as string representations, so if the variable of interest is a number it will need to be cast into an integer using as.numeric() . Command-line Arguments To collect command-line arguments in R use the commandArgs function: args = commandArgs ( trailingOnly = TRUE ) for ( x in args ){ print ( x ) } The trailingOnly=TRUE option will limit args to contain only those arguments which follow the script: Rscript my_script.R a b c [ 1 ] \"a\" [ 1 ] \"b\" [ 1 ] \"c\" There is a more advanced and detailed package for managing command-line arguments called optparse ( docs page ). This can be used to create more featured scripts in a similar way to Python's argparse . Slurm Environment Variables Slurm sets a number of environment variables detailing the layout of every job. These include: SLURM_JOB_ID : the unique jobid given to each job. Useful to set unique output directories SLURM_CPUS_PER_TASK : the number of CPUs allocated for each task. Useful as a replacement for R's detectCores or Python's multiprocessing.cpu_count which report the physical number of CPUs and not the number allocated by Slurm. SLURM_ARRAY_TASK_ID : the unique array index for each element of a job array. Useful to un-roll a loop or to set a unique random seed for parallel simulations. These can be leveraged within batch scripts using the above techniques to either pass on the command-line or directly reading the environment variable to control how a script runs. For example, if a script previously looped over values ranging from 0-9, we can modify the script and create a job array which runs each iteration separately in parallel using SLURM_ARRAY_TASK_ID to tell each element of the job array which value to use.","title":"Pass Values into Jobs"},{"location":"clusters-at-yale/guides/cmd-line-args/#pass-values-into-jobs","text":"A useful tool when running jobs on the clusters is to be able to pass variables into a script without modifying any code. This can include specifying the name of a data file to be processed, or setting a variable to a specific value. Generally, there are two ways of achieving this: environment variables and command-line arguments. Here we will work through how to implement these two approaches in both Python and R.","title":"Pass Values into Jobs"},{"location":"clusters-at-yale/guides/cmd-line-args/#python","text":"","title":"Python"},{"location":"clusters-at-yale/guides/cmd-line-args/#environment-variables","text":"In python, environment variables are accessed via the os package ( docs page ). In particular, we can use os.getenv to retrieve environment variables set prior to launching the python script. For example, consider a python script designed to process a data file: def file_cruncher ( file_name ): f = open ( file_name ) data = f . read () output = process ( data ) # processing code goes here return output We can use an environment variable ( INPUT_DATA_FILE ) to provide the filename of the data to be processed. The python script ( my_script.py ) is modified to retrieve this variable and analyze the given datafile: import os file_name = os . getenv ( \"INPUT_DATA_FILE\" ) def file_cruncher ( file_name ): f = open ( file_name ) data = f . read () output = process ( data ) # processing code goes here return output To process this data file, you would simply run: export INPUT_DATA_FILE = /path/to/file/input_0.dat python my_script.py This avoids having to modify the python script to change which datafile is processed, we only need to change the environment variable.","title":"Environment Variables"},{"location":"clusters-at-yale/guides/cmd-line-args/#command-line-arguments","text":"Similarly, one can use command-line arguments to pass values into a script. In python, there are two main packages designed for handling arguments. First is the simple sys.argv function which parses command-line arguments into a list of strings: import sys for a in sys . argv : print ( a ) Running this with a few arguments: $ python my_script.py a b c my_script.py a b c The first element in sys.argv is the name of the script, and then all subsequent arguments follow. Secondly, there is the more fully-featured argparse package ( docs page )which offers many advanced tools to manage command-line arguments. Take a look at their documentation for examples of how to use argparse .","title":"Command-line Arguments"},{"location":"clusters-at-yale/guides/cmd-line-args/#r","text":"Just as with Python, R provides comparable utilities to access command-line arguments and environment variables.","title":"R"},{"location":"clusters-at-yale/guides/cmd-line-args/#environment-variables_1","text":"The Sys.getenv utility ( docs page ) works nearly identically to the Python implementation. > Sys.getenv ( 'HOSTNAME' ) [ 1 ] \"grace2.grace.hpc.yale.internal\" Just like Python, these values are always returned as string representations, so if the variable of interest is a number it will need to be cast into an integer using as.numeric() .","title":"Environment Variables"},{"location":"clusters-at-yale/guides/cmd-line-args/#command-line-arguments_1","text":"To collect command-line arguments in R use the commandArgs function: args = commandArgs ( trailingOnly = TRUE ) for ( x in args ){ print ( x ) } The trailingOnly=TRUE option will limit args to contain only those arguments which follow the script: Rscript my_script.R a b c [ 1 ] \"a\" [ 1 ] \"b\" [ 1 ] \"c\" There is a more advanced and detailed package for managing command-line arguments called optparse ( docs page ). This can be used to create more featured scripts in a similar way to Python's argparse .","title":"Command-line Arguments"},{"location":"clusters-at-yale/guides/cmd-line-args/#slurm-environment-variables","text":"Slurm sets a number of environment variables detailing the layout of every job. These include: SLURM_JOB_ID : the unique jobid given to each job. Useful to set unique output directories SLURM_CPUS_PER_TASK : the number of CPUs allocated for each task. Useful as a replacement for R's detectCores or Python's multiprocessing.cpu_count which report the physical number of CPUs and not the number allocated by Slurm. SLURM_ARRAY_TASK_ID : the unique array index for each element of a job array. Useful to un-roll a loop or to set a unique random seed for parallel simulations. These can be leveraged within batch scripts using the above techniques to either pass on the command-line or directly reading the environment variable to control how a script runs. For example, if a script previously looped over values ranging from 0-9, we can modify the script and create a job array which runs each iteration separately in parallel using SLURM_ARRAY_TASK_ID to tell each element of the job array which value to use.","title":"Slurm Environment Variables"},{"location":"clusters-at-yale/guides/comsol/","text":"COMSOL YCRC has COMSOL Multiphysics 5.2a available on Grace. It can be used to run basic physical and multiphysics models on one node utilizing multiple cores. If you need to run run models across multiple nodes or need to run COMSOL on your local machine, please contact us . Use COMSOL To use COMSOL on the cluster, load the COMSOL module by running module load COMSOL/5.2a-classkit . For more information on our modules, please see our software modules documentation. COMSOL has a resource intenstive GUI and, therefore, we strongly recommend using COMSOL in a Remote Desktop session on the Open OnDemand web portal . To launch COMSOL in your Remote Desktop, open the terminal application in the session and enter the following commands: module load COMSOL/5.2a-classkit comsol -np $SLURM_CPUS_ON_NODE & Run COMSOL in Batch Mode Comsol can be run without the graphical interface assuming you have a model file and a study defined beforehand. This is particularly useful for parametric sweeps or scanning over a range of values for specific parameters. For example: comsol batch -configuration /tmp -data /tmp -prefsdir /tmp -inputfile mymodel.mph -outputfile out.mph -study std1 which will run the study std1 found within the mymodel.mph file generated through the COMSOL GUI and save the outputs in out.mph . A parameter can be passed into the study like this: comsol batch -inputfile mymodel.mph -outputfile out.mph -pname L -plist 8[cm],10[cm],12[cm] Which will run three versions of the model sequentially for each of the three values of L enumerated. When combined with Slurm Job Arrays many COMSOL jobs can be run in parallel. An example dSQ job-file would look like: module load COMSOL ; comsol batch -inputfile mymodel.mph -outputfile out_8.mph -pname L -plist 8 [ cm ] module load COMSOL ; comsol batch -inputfile mymodel.mph -outputfile out_10.mph -pname L -plist 10 [ cm ] module load COMSOL ; comsol batch -inputfile mymodel.mph -outputfile out_12.mph -pname L -plist 12 [ cm ] Which would run three versions of the study using different values of L and save their outputs in separate files. Be careful to provide a different output file for each line to avoid clashes between the separate jobs. More details can be found on the COMSOL documentation site . Details of COMSOL on YCRC Clusters Two COMSOL modules (Heat Transfer and Structural Mechanics) are included in addition to the main multiphysics engine. The following models might be solved using our COMSOL package both in stationary and time dependent studies. AC/DC. Electric Currents and Electrostatics in 1D, 2D, 3D models. Magnetic Fields in 2D. Acoustics. Pressure acoustics in frequency domain in 1D, 2D, 3D models. Chemical Species Transport. Transport of Diluted Species in 1D, 2D, 3D models. Transport and reactions of the species dissolved in a gas, liquid, or solid can be handled with this interface. The driving forces for transport can be diffusion, convection when coupled to a flow field, and migration, when coupled to an electric field. Moisture Transport in 1D, 2D, 3D is used to model moisture transfer in a porous medium. Fluid Flow. Single Phase Laminar and Turbulent Flow including non-isothermal flow in 2D, 3D models. Fluid-Structure Interaction in 2D, 3D models for both fixed geometry and deformed solid. Heat Transfer in 1D, 2D, 3D models. HT in Solids and Fluids. HT in porous media including non-equilibrium transfer. Bioheat transfer. Surface to Surface Radiation. Joule Heating. HT in thin structures (2D, 3D) like shells, films, fractures. Conjugate HT from laminar and turbulent flows (2D, 3D). Heat and moisture transport. Thermoelastic effect. Plasma in 1D. Equilibrium DC Discharges that are sustained by a static or slow-varying electric field where induction currents and fluid flow effects are negligible. Structural Mechanics in 2D, 3D models. Solid Mechanics (elastic). Plate Truss in 2D. Beam, Truss (2D, 3D). Membrane (2D axisymmetric, 3D). Shell (3D). Thermal stress. Thermal expansion. Piezoelectricity. General Mathematics equations in 1D, 2D, 3D models. Classic PDE. Coefficient based and general form PDE. Wave form PDE. Weak form PDE. Ordinary differential equations and algebraic equations. Deformed geometry and moving mesh. Curvilinear coordinates. All above models can be used in the Multiphysics approach of coupling them together. They can be solved in Full Couple mode or by using Segregated Solver (solving one physical model and using resulting field to model another, and so on). Backward Compatibility COMSOL is not backwards compatible. If you have a project file from a newer version of COMSOL (e.g. 5.3), it will not open in 5.2a. However, in some circumstances, we can assist with porting the project file back to version 5.2a. If you have any questions about this, please contact us . Limitations of Available License Please note that some commonly used COMSOL features such as CAD Import Module, Material Library, and MatLab Link are not included in the license. COMSOL Material Library consists of about 2500 different materials with their physical properties. Many of them are included with temperature dependancies. Without this library you have to specify material parameters manually, however, you can save your new material for future use. We can help in adding material form COMSOL library to your project file using a different license. You cannot import geometry designed by external CAD program like SolidWorks, Autocad, etc. Instead you have to design it inside COMSOL. However, we can help you to perform such import utilizing different license; we\u2019ll save it in COMSOL project file and you would be able to open it with already imported geometry. More advanced users often use MatLab for automation of COMSOL models and extracting results data for mining them by external methods available in MatLab. Unfortunately, you cannot do this with the license available on the cluster. Please contact us if you feel you need to utilize MatLab. Lastly, our license does not allow to use COMSOL for solving models based on Maxwell Equations (RF, Wave Optics), semiconductor models, particle tracing, ray optics, non-linear mechanics, and some other more advanced modules. To approach such models in COMSOL on your local computer, please contact us to use our more general license with very limited number of licensed seats.","title":"COMSOL"},{"location":"clusters-at-yale/guides/comsol/#comsol","text":"YCRC has COMSOL Multiphysics 5.2a available on Grace. It can be used to run basic physical and multiphysics models on one node utilizing multiple cores. If you need to run run models across multiple nodes or need to run COMSOL on your local machine, please contact us .","title":"COMSOL"},{"location":"clusters-at-yale/guides/comsol/#use-comsol","text":"To use COMSOL on the cluster, load the COMSOL module by running module load COMSOL/5.2a-classkit . For more information on our modules, please see our software modules documentation. COMSOL has a resource intenstive GUI and, therefore, we strongly recommend using COMSOL in a Remote Desktop session on the Open OnDemand web portal . To launch COMSOL in your Remote Desktop, open the terminal application in the session and enter the following commands: module load COMSOL/5.2a-classkit comsol -np $SLURM_CPUS_ON_NODE &","title":"Use COMSOL"},{"location":"clusters-at-yale/guides/comsol/#run-comsol-in-batch-mode","text":"Comsol can be run without the graphical interface assuming you have a model file and a study defined beforehand. This is particularly useful for parametric sweeps or scanning over a range of values for specific parameters. For example: comsol batch -configuration /tmp -data /tmp -prefsdir /tmp -inputfile mymodel.mph -outputfile out.mph -study std1 which will run the study std1 found within the mymodel.mph file generated through the COMSOL GUI and save the outputs in out.mph . A parameter can be passed into the study like this: comsol batch -inputfile mymodel.mph -outputfile out.mph -pname L -plist 8[cm],10[cm],12[cm] Which will run three versions of the model sequentially for each of the three values of L enumerated. When combined with Slurm Job Arrays many COMSOL jobs can be run in parallel. An example dSQ job-file would look like: module load COMSOL ; comsol batch -inputfile mymodel.mph -outputfile out_8.mph -pname L -plist 8 [ cm ] module load COMSOL ; comsol batch -inputfile mymodel.mph -outputfile out_10.mph -pname L -plist 10 [ cm ] module load COMSOL ; comsol batch -inputfile mymodel.mph -outputfile out_12.mph -pname L -plist 12 [ cm ] Which would run three versions of the study using different values of L and save their outputs in separate files. Be careful to provide a different output file for each line to avoid clashes between the separate jobs. More details can be found on the COMSOL documentation site .","title":"Run COMSOL in Batch Mode"},{"location":"clusters-at-yale/guides/comsol/#details-of-comsol-on-ycrc-clusters","text":"Two COMSOL modules (Heat Transfer and Structural Mechanics) are included in addition to the main multiphysics engine. The following models might be solved using our COMSOL package both in stationary and time dependent studies. AC/DC. Electric Currents and Electrostatics in 1D, 2D, 3D models. Magnetic Fields in 2D. Acoustics. Pressure acoustics in frequency domain in 1D, 2D, 3D models. Chemical Species Transport. Transport of Diluted Species in 1D, 2D, 3D models. Transport and reactions of the species dissolved in a gas, liquid, or solid can be handled with this interface. The driving forces for transport can be diffusion, convection when coupled to a flow field, and migration, when coupled to an electric field. Moisture Transport in 1D, 2D, 3D is used to model moisture transfer in a porous medium. Fluid Flow. Single Phase Laminar and Turbulent Flow including non-isothermal flow in 2D, 3D models. Fluid-Structure Interaction in 2D, 3D models for both fixed geometry and deformed solid. Heat Transfer in 1D, 2D, 3D models. HT in Solids and Fluids. HT in porous media including non-equilibrium transfer. Bioheat transfer. Surface to Surface Radiation. Joule Heating. HT in thin structures (2D, 3D) like shells, films, fractures. Conjugate HT from laminar and turbulent flows (2D, 3D). Heat and moisture transport. Thermoelastic effect. Plasma in 1D. Equilibrium DC Discharges that are sustained by a static or slow-varying electric field where induction currents and fluid flow effects are negligible. Structural Mechanics in 2D, 3D models. Solid Mechanics (elastic). Plate Truss in 2D. Beam, Truss (2D, 3D). Membrane (2D axisymmetric, 3D). Shell (3D). Thermal stress. Thermal expansion. Piezoelectricity. General Mathematics equations in 1D, 2D, 3D models. Classic PDE. Coefficient based and general form PDE. Wave form PDE. Weak form PDE. Ordinary differential equations and algebraic equations. Deformed geometry and moving mesh. Curvilinear coordinates. All above models can be used in the Multiphysics approach of coupling them together. They can be solved in Full Couple mode or by using Segregated Solver (solving one physical model and using resulting field to model another, and so on).","title":"Details of COMSOL on YCRC Clusters"},{"location":"clusters-at-yale/guides/comsol/#backward-compatibility","text":"COMSOL is not backwards compatible. If you have a project file from a newer version of COMSOL (e.g. 5.3), it will not open in 5.2a. However, in some circumstances, we can assist with porting the project file back to version 5.2a. If you have any questions about this, please contact us .","title":"Backward Compatibility"},{"location":"clusters-at-yale/guides/comsol/#limitations-of-available-license","text":"Please note that some commonly used COMSOL features such as CAD Import Module, Material Library, and MatLab Link are not included in the license. COMSOL Material Library consists of about 2500 different materials with their physical properties. Many of them are included with temperature dependancies. Without this library you have to specify material parameters manually, however, you can save your new material for future use. We can help in adding material form COMSOL library to your project file using a different license. You cannot import geometry designed by external CAD program like SolidWorks, Autocad, etc. Instead you have to design it inside COMSOL. However, we can help you to perform such import utilizing different license; we\u2019ll save it in COMSOL project file and you would be able to open it with already imported geometry. More advanced users often use MatLab for automation of COMSOL models and extracting results data for mining them by external methods available in MatLab. Unfortunately, you cannot do this with the license available on the cluster. Please contact us if you feel you need to utilize MatLab. Lastly, our license does not allow to use COMSOL for solving models based on Maxwell Equations (RF, Wave Optics), semiconductor models, particle tracing, ray optics, non-linear mechanics, and some other more advanced modules. To approach such models in COMSOL on your local computer, please contact us to use our more general license with very limited number of licensed seats.","title":"Limitations of Available License"},{"location":"clusters-at-yale/guides/conda/","text":"Conda Conda is a package, dependency, and environment manager. It allows you to maintain different, often incompatible, sets of applications side-by-side. It has become a popular choice for managing pipelines that involve several tools, especially when multiple languages are involved. These sets of applications and their dependencies are kept in Conda environments, which you can switch between as your work dictates. Compared to the modules that we provide, there are often newer and more varied packages available that you can manage yourself, but they may not be as well optimized for the clusters. See Conda's official command-line reference and the offical docs for managing environments for detailed instructions. Here we present essential instructions and site-specific info. Warning Mixing modules and conda-managed software is almost never a good idea. When constructing an environment for your work you should load either modules or a conda environment. If you get stuck, you can always ask us for help . The Miniconda Module For your convenience, we provide a relatively recent version of Miniconda as a module. This is a read-only environment from which you can create your own. We set some defaults for you in this module, and we keep it relatively up-to-date so you don't have to. If you are using Conda-installed packages, this should be the only module you load in your jobs. Note: If you are on Milgram and run out of space in your home directory for Conda, you can either reinstall your environment in your project space (see below) or contact us for help with your home quota. Defaults We Set On all clusters, we set the CONDA_ENVS_PATH and CONDA_PKGS_DIRS environment variables to conda_envs and conda_pkgs in your project directory where there is more quota available. Conda will install to and search in these directories for environments and cached packages. Starting with minconda module version 4.8.3 we set the default channels (the sources to find packages) to conda-forge and bioconda , which provide a wider array of packages than the default channels do. We have found it saves a lot of typing. If you would like to override these defaults, see the Conda docs on managing channels. Below is the .condarc for the miniconda module. env_prompt : '({name})' auto_activate_base : false channels : - conda-forge - bioconda - defaults Setup Your Environment Load the miniconda Module module load miniconda You can save this to your default module collection by using module save . See our module documentation for more details. Create a conda Environment To create an environment use the conda create command. Environment files are saved to the first path in $CONDA_ENVS_PATH , or where you specify with the --prefix option. You should give your environments names that are meaningful to you, so you can more easily keep track of their purposes. Because dependency resolution is hard and messy, we find specifying as many packages as possible at environment creation time can help minimize broken dependencies. Although sometimes unavoidable for Python, we recommend against heavily mixing the use of conda and pip to install applications. If needed, try to get as much installed with conda , then use pip to get the rest of the way to your desired environment. Tip For added reproducibility and control, specify versions of packages to be installed using conda with packagename=version syntax. E.g. numpy=1.14 For example, if you have a legacy application that needs Python 2 and OpenBLAS: module load miniconda conda create -n legacy_application python = 2 .7 openblas If you want a good starting point for interactive data science in R/Python Jupyter Notebooks: module load miniconda conda create -n ds_notebook python numpy scipy pandas matplotlib ipython jupyter r-irkernel r-ggplot2 r-tidyverse Note that you can also install jupyterlab instead of, or alongside jupyter. Conda Channels Community-lead collections of packages that you can install with conda are provided with channels. Some labs will provide their own software using this method. A few popular examples are Conda Forge and Bioconda , which we set for you by default. See the Conda docs for more info about managing channels. You can create a new environment called brian2 (specified with the -n option) and install Brian2 into it with the following: module load miniconda conda create -n brian2 brian2 # normally you would need this: # conda create -n brian2 --channel conda-forge brian2 You can also install packages from Bioconda, for example: module load miniconda conda create -n bioinfo biopython bedtools bowtie2 repeatmasker # normally you would need this: # conda create -n bioinfo --channel conda-forge --channel bioconda biopython bedtools bowtie2 repeatmasker Mamba: The Conda Alternative For complicated environments, conda can often strugle to \"solve\" the required set of packages in a reasonable time. An alternative tool, called mamba , has been developed, bringing a faster dependency solver based on libsolv , which is used in modern RPM package managers. mamba is a drop-in replacement for conda and environments can be created or new packages installed in the same way as with conda : module load miniconda # create new environment mamba create --name env_name python numpy pandas jupyter # install new pacakge into existing environment conda activate env_name mamba install scipy scikit-learn The mamba utility is installed in the YCRC base environment and is available for general use. For more details, see the Mamba GitHub page . Use Your Environment To use the applications in your environment, run the following: module load miniconda conda activate env_name Warning We recommend against putting source activate or conda activate commands in your ~/.bashrc file. This can lead to issues in interactive or batch jobs. If you have issues with an environment, trying re-loading the environment by calling conda deactivate before rerunning conda activate env_name . Interactive Your Conda environments will not follow you into job allocations. Make sure to activate them after your interactive job begins. In a Job Script To make sure that you are running in your project environment in a submission script, make sure to include the following lines in your submission script before running any other commands or scripts (but after your Slurm directives ): #!/bin/bash #SBATCH --partition=general #SBATCH --job-name=my_conda_job #SBATCH --cpus-per-task 4 #SBATCH --mem-per-cpu=6000 module load miniconda conda activate env_name python analyses.py Find and Install Additional Packages You can search Anaconda Cloud or use conda search to find the names of packages you would like to install: module load miniconda conda search numpy Compiling Codes You may need to compile codes in a conda environment, for example, installing an R package in a conda R env. This requires you to have the GNU C compiler and its development libraries installed in the conda env before compiling any codes: conda install gcc_linux-64 Without gcc_linux-64 , the code will be compiled using the system compiler and libraries. You will experience run-time errors when running the code in the conda environment. Troubleshoot Conda version doesn't match the module loaded If you have run conda init in the past, you may be locked to an old version of conda . You can run the following to fix this: sed -i.bak -ne '/# >>> conda init/,/# <<< conda init/!p' ~/.bashrc Permission Denied If you get a permission denied error when running conda install or pip install for a package, make sure you have created an environment and activated it or activated an existing one first. bash: conda: No such file or directory If you get the above error, it is likely that you don't have the necessary module file loaded. Try loading the minconda module and rerunning your conda activate env_name command. Could not find environment This error means that the version of miniconda you have loaded doesn't recognize the environment name you have supplied. Make sure you have the miniconda module loaded (and not a different Python module) and have previously created this environment. You can see a list of previously created environments by running: module load miniconda conda info --envs Additional Conda Commands List Installed Packages module load miniconda conda list --name env_name Delete a Conda Environment module load miniconda conda remove --name env_name --all Save and Export Environments There are two concepts for rebuilding conda environments: a copy of an existing environment, with identical versions of each package a fresh build following the same steps taken to creat the first environment (letting unspecified versions float) This short doc will walk through recommended approaches to both styles of exporting and rebuilding a generic environment named test containing python, jupyter, numpy, and scipy. Full Export Including Dependencies To export the exact versions of each package installed (including all dependencies) run: module load miniconda conda env export --no-builds --name test | grep -v prefix > test_export.yaml This yaml file is ~230 lines long and contains every package that is installed in the test environment. The conda export command includes information about the path where it was installed (i.e. the prefix ). To remove this hard-coded path, we need to remove the line in this print out related to the \"prefix\". Export Only Specified Packages If we simply wish to rebuild the environment using the steps previously employed to create it, we can replace --no-builds with --from-history . module load miniconda conda env export --from-history --name test | grep -v prefix > test_export.yaml This is a much smaller file, ~10 lines, and only lists the packages explicitly installed: name: test channels: - conda-forge - defaults - bioconda dependencies: - scipy - numpy=1.21 - jupyter - python=3.8 In this environment, the versions of python and numpy were pinned during installation, but scipy and jupyter were left to get the most recent compatible version. Build a New Environment To create a new environment using all the enumerated pacakges: module load miniconda conda env create --file test_export.yaml This will create a new environment with the same name test . The yaml file can be edited to change the name of the new environment.","title":"Conda"},{"location":"clusters-at-yale/guides/conda/#conda","text":"Conda is a package, dependency, and environment manager. It allows you to maintain different, often incompatible, sets of applications side-by-side. It has become a popular choice for managing pipelines that involve several tools, especially when multiple languages are involved. These sets of applications and their dependencies are kept in Conda environments, which you can switch between as your work dictates. Compared to the modules that we provide, there are often newer and more varied packages available that you can manage yourself, but they may not be as well optimized for the clusters. See Conda's official command-line reference and the offical docs for managing environments for detailed instructions. Here we present essential instructions and site-specific info. Warning Mixing modules and conda-managed software is almost never a good idea. When constructing an environment for your work you should load either modules or a conda environment. If you get stuck, you can always ask us for help .","title":"Conda"},{"location":"clusters-at-yale/guides/conda/#the-miniconda-module","text":"For your convenience, we provide a relatively recent version of Miniconda as a module. This is a read-only environment from which you can create your own. We set some defaults for you in this module, and we keep it relatively up-to-date so you don't have to. If you are using Conda-installed packages, this should be the only module you load in your jobs. Note: If you are on Milgram and run out of space in your home directory for Conda, you can either reinstall your environment in your project space (see below) or contact us for help with your home quota.","title":"The Miniconda Module"},{"location":"clusters-at-yale/guides/conda/#defaults-we-set","text":"On all clusters, we set the CONDA_ENVS_PATH and CONDA_PKGS_DIRS environment variables to conda_envs and conda_pkgs in your project directory where there is more quota available. Conda will install to and search in these directories for environments and cached packages. Starting with minconda module version 4.8.3 we set the default channels (the sources to find packages) to conda-forge and bioconda , which provide a wider array of packages than the default channels do. We have found it saves a lot of typing. If you would like to override these defaults, see the Conda docs on managing channels. Below is the .condarc for the miniconda module. env_prompt : '({name})' auto_activate_base : false channels : - conda-forge - bioconda - defaults","title":"Defaults We Set"},{"location":"clusters-at-yale/guides/conda/#setup-your-environment","text":"","title":"Setup Your Environment"},{"location":"clusters-at-yale/guides/conda/#load-the-miniconda-module","text":"module load miniconda You can save this to your default module collection by using module save . See our module documentation for more details.","title":"Load the miniconda Module"},{"location":"clusters-at-yale/guides/conda/#create-a-conda-environment","text":"To create an environment use the conda create command. Environment files are saved to the first path in $CONDA_ENVS_PATH , or where you specify with the --prefix option. You should give your environments names that are meaningful to you, so you can more easily keep track of their purposes. Because dependency resolution is hard and messy, we find specifying as many packages as possible at environment creation time can help minimize broken dependencies. Although sometimes unavoidable for Python, we recommend against heavily mixing the use of conda and pip to install applications. If needed, try to get as much installed with conda , then use pip to get the rest of the way to your desired environment. Tip For added reproducibility and control, specify versions of packages to be installed using conda with packagename=version syntax. E.g. numpy=1.14 For example, if you have a legacy application that needs Python 2 and OpenBLAS: module load miniconda conda create -n legacy_application python = 2 .7 openblas If you want a good starting point for interactive data science in R/Python Jupyter Notebooks: module load miniconda conda create -n ds_notebook python numpy scipy pandas matplotlib ipython jupyter r-irkernel r-ggplot2 r-tidyverse Note that you can also install jupyterlab instead of, or alongside jupyter.","title":"Create a conda Environment"},{"location":"clusters-at-yale/guides/conda/#conda-channels","text":"Community-lead collections of packages that you can install with conda are provided with channels. Some labs will provide their own software using this method. A few popular examples are Conda Forge and Bioconda , which we set for you by default. See the Conda docs for more info about managing channels. You can create a new environment called brian2 (specified with the -n option) and install Brian2 into it with the following: module load miniconda conda create -n brian2 brian2 # normally you would need this: # conda create -n brian2 --channel conda-forge brian2 You can also install packages from Bioconda, for example: module load miniconda conda create -n bioinfo biopython bedtools bowtie2 repeatmasker # normally you would need this: # conda create -n bioinfo --channel conda-forge --channel bioconda biopython bedtools bowtie2 repeatmasker","title":"Conda Channels"},{"location":"clusters-at-yale/guides/conda/#mamba-the-conda-alternative","text":"For complicated environments, conda can often strugle to \"solve\" the required set of packages in a reasonable time. An alternative tool, called mamba , has been developed, bringing a faster dependency solver based on libsolv , which is used in modern RPM package managers. mamba is a drop-in replacement for conda and environments can be created or new packages installed in the same way as with conda : module load miniconda # create new environment mamba create --name env_name python numpy pandas jupyter # install new pacakge into existing environment conda activate env_name mamba install scipy scikit-learn The mamba utility is installed in the YCRC base environment and is available for general use. For more details, see the Mamba GitHub page .","title":"Mamba: The Conda Alternative"},{"location":"clusters-at-yale/guides/conda/#use-your-environment","text":"To use the applications in your environment, run the following: module load miniconda conda activate env_name Warning We recommend against putting source activate or conda activate commands in your ~/.bashrc file. This can lead to issues in interactive or batch jobs. If you have issues with an environment, trying re-loading the environment by calling conda deactivate before rerunning conda activate env_name .","title":"Use Your Environment"},{"location":"clusters-at-yale/guides/conda/#interactive","text":"Your Conda environments will not follow you into job allocations. Make sure to activate them after your interactive job begins.","title":"Interactive"},{"location":"clusters-at-yale/guides/conda/#in-a-job-script","text":"To make sure that you are running in your project environment in a submission script, make sure to include the following lines in your submission script before running any other commands or scripts (but after your Slurm directives ): #!/bin/bash #SBATCH --partition=general #SBATCH --job-name=my_conda_job #SBATCH --cpus-per-task 4 #SBATCH --mem-per-cpu=6000 module load miniconda conda activate env_name python analyses.py","title":"In a Job Script"},{"location":"clusters-at-yale/guides/conda/#find-and-install-additional-packages","text":"You can search Anaconda Cloud or use conda search to find the names of packages you would like to install: module load miniconda conda search numpy","title":"Find and Install Additional Packages"},{"location":"clusters-at-yale/guides/conda/#compiling-codes","text":"You may need to compile codes in a conda environment, for example, installing an R package in a conda R env. This requires you to have the GNU C compiler and its development libraries installed in the conda env before compiling any codes: conda install gcc_linux-64 Without gcc_linux-64 , the code will be compiled using the system compiler and libraries. You will experience run-time errors when running the code in the conda environment.","title":"Compiling Codes"},{"location":"clusters-at-yale/guides/conda/#troubleshoot","text":"","title":"Troubleshoot"},{"location":"clusters-at-yale/guides/conda/#conda-version-doesnt-match-the-module-loaded","text":"If you have run conda init in the past, you may be locked to an old version of conda . You can run the following to fix this: sed -i.bak -ne '/# >>> conda init/,/# <<< conda init/!p' ~/.bashrc","title":"Conda version doesn't match the module loaded"},{"location":"clusters-at-yale/guides/conda/#permission-denied","text":"If you get a permission denied error when running conda install or pip install for a package, make sure you have created an environment and activated it or activated an existing one first.","title":"Permission Denied"},{"location":"clusters-at-yale/guides/conda/#bash-conda-no-such-file-or-directory","text":"If you get the above error, it is likely that you don't have the necessary module file loaded. Try loading the minconda module and rerunning your conda activate env_name command.","title":"bash: conda: No such file or directory"},{"location":"clusters-at-yale/guides/conda/#could-not-find-environment","text":"This error means that the version of miniconda you have loaded doesn't recognize the environment name you have supplied. Make sure you have the miniconda module loaded (and not a different Python module) and have previously created this environment. You can see a list of previously created environments by running: module load miniconda conda info --envs","title":"Could not find environment"},{"location":"clusters-at-yale/guides/conda/#additional-conda-commands","text":"","title":"Additional Conda Commands"},{"location":"clusters-at-yale/guides/conda/#list-installed-packages","text":"module load miniconda conda list --name env_name","title":"List Installed Packages"},{"location":"clusters-at-yale/guides/conda/#delete-a-conda-environment","text":"module load miniconda conda remove --name env_name --all","title":"Delete a Conda Environment"},{"location":"clusters-at-yale/guides/conda/#save-and-export-environments","text":"There are two concepts for rebuilding conda environments: a copy of an existing environment, with identical versions of each package a fresh build following the same steps taken to creat the first environment (letting unspecified versions float) This short doc will walk through recommended approaches to both styles of exporting and rebuilding a generic environment named test containing python, jupyter, numpy, and scipy.","title":"Save and Export Environments"},{"location":"clusters-at-yale/guides/conda/#full-export-including-dependencies","text":"To export the exact versions of each package installed (including all dependencies) run: module load miniconda conda env export --no-builds --name test | grep -v prefix > test_export.yaml This yaml file is ~230 lines long and contains every package that is installed in the test environment. The conda export command includes information about the path where it was installed (i.e. the prefix ). To remove this hard-coded path, we need to remove the line in this print out related to the \"prefix\".","title":"Full Export Including Dependencies"},{"location":"clusters-at-yale/guides/conda/#export-only-specified-packages","text":"If we simply wish to rebuild the environment using the steps previously employed to create it, we can replace --no-builds with --from-history . module load miniconda conda env export --from-history --name test | grep -v prefix > test_export.yaml This is a much smaller file, ~10 lines, and only lists the packages explicitly installed: name: test channels: - conda-forge - defaults - bioconda dependencies: - scipy - numpy=1.21 - jupyter - python=3.8 In this environment, the versions of python and numpy were pinned during installation, but scipy and jupyter were left to get the most recent compatible version.","title":"Export Only Specified Packages"},{"location":"clusters-at-yale/guides/conda/#build-a-new-environment","text":"To create a new environment using all the enumerated pacakges: module load miniconda conda env create --file test_export.yaml This will create a new environment with the same name test . The yaml file can be edited to change the name of the new environment.","title":"Build a New Environment"},{"location":"clusters-at-yale/guides/containers/","text":"Containers Warning The Singularity project has been renamed Apptainer . Everything should still work the same, including the 'singularity' command. If you find it not working as expected, please contact us . Apptainer (formerly Singularity) is a Linux container technology that is well suited to use in shared-user environments such as the clusters we maintain at Yale. It is similar to Docker ; You can bring with you a stack of software, libraries, and a Linux operating system that is independent of the host computer you run the container on. This can be very useful if you want to share your software environment with other researchers or yourself across several computers. Because Apptainer containers run as the user that started them and mount home directories by default, you can usually see the data you're interested in working on that is stored on a host computer without any extra work. Below we will outline some common use cases covering the creation and use of containers. There is also excellent documentation available on the full and official user guide for Apptainer . We are happy to help, just contact us with your questions. Warning On the Yale clusters, Apptainer is not installed on login nodes. You will need to run it from compute nodes. Apptainer Containers Images are the file(s) you use to run your container. Apptainer images are single files that usually end in .sif and are read-only by default, meaning changes you make to the environment inside the container are not persistent. Use a Pre-existing Container If someone has already built a container that suits your needs, you can use it directly. Apptainer images are single files that can be transferred to the clusters. You can fetch images from container registries such as Docker Hub or NVidia Container Registry . Container images can take up a lot of disk space (dozens of gigabytes), so you may want to change the default location Apptainer uses to cache these files. To do this before getting started, you should add something like the example below to to your ~/.bashrc file: # set APPTAINER_CACHEDIR if you want to pull files (which can get big) somewhere other than $HOME/.apptainer # e.g. export APPTAINER_CACHEDIR = ~/scratch60/.apptainer Here are some examples of getting containers already built by someone else with apptainer: # from Docker Hub (https://hub.docker.com/) apptainer build ubuntu-18.10.sif docker://ubuntu:18.10 apptainer build tensorflow-10.0-py3.sif docker://tensorflow/tensorflow:1.10.0-py3 # from Singularity Hub (no longer updated) apptainer build bioconvert-latest.sif shub://biokit/bioconvert:latest Build Your Own Container You can define a container image to be exactly how you want/need it to be, including applications, libraries, and files of your choosing with a definition file . Apptainer definition files are similar to Docker's Dockerfile , but use different syntax. For full definition files and more documentation please see the Apptainer site . Header Every container definition must begin with a header that defines what image to start with, or bootstrap from. This can be an official Linux distribution or someone else's container that gets you nearly what you want. To start from Ubuntu Bionic Beaver (18.04 LTS): Bootstrap: docker From: ubuntu:18.04 Or an Nvidia developer image Bootstrap: docker From: nvidia/cuda:9.2-cudnn7-devel-ubuntu18.04 The rest of the sections all begin with % and the section name. You will see section contents indented by convention, but this is not required. %labels The labels section allows you to define metadata for your container: %labels Name Maintainer \"YCRC Support Team\" Version v99.9 Architecture x86_64 URL https://research.computing.yale.edu/ You can examine container metadata with the apptainer inspect command. %files If you'd like to copy any files from the system you are building on, you do so in the %files section. Each line in the files section is a pair of source and destination paths, where the source is on your host system, and destination is a path in the container. %files sample_data.tar /opt/sample_data/ example_script.sh /opt/sample_data/ %post The post section is where you can run updates, installs, etc in your container to customize it. %post echo \"Customizing Ubuntu\" apt-get update apt-get -y install software-properties-common build-essential cmake add-apt-repository universe apt-get update apt-get -y libboost-all-dev libgl1-mesa-dev libglu1-mesa-dev cd /tmp git clone https://github.com/gitdudette/myapp && cd myapp # ... etc etc %environment The environment section allows you to define environment variables for your container. These variables are available when you run the built container, not during its build. %environment export PATH = /opt/my_app/bin: $PATH export LD_LIBRARY_PATH = /opt/my_app/lib: $LD_LIBRARY_PATH Building To finally build your container after saving your definition file as my_app.def , for example, you would run apptainer build my_app.sif my_app.def Use a Container Image Once you have a container image, you can run it as a part of a batch job, or interactively. Interactively To get a shell in a container so you can interactively work in its environment: apptainer shell --shell /bin/bash containername.sif In a Job Script You can also run applications from your container non-interactively as you would in a batch job. If I wanted to run a script called my_script.py using my container's python: apptainer exec containername.sif python my_script.py Environment Variables If you are unsure if you are running inside or outside your container, you can run: echo $APPTAINER_NAME If you get back text, you are in your container. If you'd like to pass environment variables into your container, you can do so by defining them prefixed with APPTAINERENV_ . For Example: export APPTAINERENV_BLASTDB = /home/me/db/blast apptainer exec my_blast_image.sif env | grep BLAST Should return BLASTDB=/home/me/db/blast , which means you set the BLASTDB environment variable in the container properly. Additional Notes MPI MPI support for Apptainer is relatively straight-forward. The only thing to watch is to make sure that you are using the same version of MPI inside your container as you are on the cluster. GPUs You can use GPU-accelerated code inside your container, which will need most everything also installed in your container (e.g. CUDA, cuDNN). In order for your applications to have access to the right drivers on the host machine, use the --nv flag. For example: apptainer exec --nv tensorflow-10.0-py3.sif python ./my-tf-model.py Home Directories Sometimes the maintainer of a Docker container you are trying to use installed software into a special user's home directory. If you need access to someone's home directory that exists in the container and not on the host, you should add the --contain option. Unfortunately, you will also then have to explicitly tell Apptainer about the paths that you want to use from inside the container with the --bind option. apptainer shell --shell /bin/bash --contain --bind /gpfs/gibbs/project/support/be59:/home/be59/project bioconvert-latest.sif","title":"Containers"},{"location":"clusters-at-yale/guides/containers/#containers","text":"Warning The Singularity project has been renamed Apptainer . Everything should still work the same, including the 'singularity' command. If you find it not working as expected, please contact us . Apptainer (formerly Singularity) is a Linux container technology that is well suited to use in shared-user environments such as the clusters we maintain at Yale. It is similar to Docker ; You can bring with you a stack of software, libraries, and a Linux operating system that is independent of the host computer you run the container on. This can be very useful if you want to share your software environment with other researchers or yourself across several computers. Because Apptainer containers run as the user that started them and mount home directories by default, you can usually see the data you're interested in working on that is stored on a host computer without any extra work. Below we will outline some common use cases covering the creation and use of containers. There is also excellent documentation available on the full and official user guide for Apptainer . We are happy to help, just contact us with your questions. Warning On the Yale clusters, Apptainer is not installed on login nodes. You will need to run it from compute nodes.","title":"Containers"},{"location":"clusters-at-yale/guides/containers/#apptainer-containers","text":"Images are the file(s) you use to run your container. Apptainer images are single files that usually end in .sif and are read-only by default, meaning changes you make to the environment inside the container are not persistent.","title":"Apptainer Containers"},{"location":"clusters-at-yale/guides/containers/#use-a-pre-existing-container","text":"If someone has already built a container that suits your needs, you can use it directly. Apptainer images are single files that can be transferred to the clusters. You can fetch images from container registries such as Docker Hub or NVidia Container Registry . Container images can take up a lot of disk space (dozens of gigabytes), so you may want to change the default location Apptainer uses to cache these files. To do this before getting started, you should add something like the example below to to your ~/.bashrc file: # set APPTAINER_CACHEDIR if you want to pull files (which can get big) somewhere other than $HOME/.apptainer # e.g. export APPTAINER_CACHEDIR = ~/scratch60/.apptainer Here are some examples of getting containers already built by someone else with apptainer: # from Docker Hub (https://hub.docker.com/) apptainer build ubuntu-18.10.sif docker://ubuntu:18.10 apptainer build tensorflow-10.0-py3.sif docker://tensorflow/tensorflow:1.10.0-py3 # from Singularity Hub (no longer updated) apptainer build bioconvert-latest.sif shub://biokit/bioconvert:latest","title":"Use a Pre-existing Container"},{"location":"clusters-at-yale/guides/containers/#build-your-own-container","text":"You can define a container image to be exactly how you want/need it to be, including applications, libraries, and files of your choosing with a definition file . Apptainer definition files are similar to Docker's Dockerfile , but use different syntax. For full definition files and more documentation please see the Apptainer site .","title":"Build Your Own Container"},{"location":"clusters-at-yale/guides/containers/#header","text":"Every container definition must begin with a header that defines what image to start with, or bootstrap from. This can be an official Linux distribution or someone else's container that gets you nearly what you want. To start from Ubuntu Bionic Beaver (18.04 LTS): Bootstrap: docker From: ubuntu:18.04 Or an Nvidia developer image Bootstrap: docker From: nvidia/cuda:9.2-cudnn7-devel-ubuntu18.04 The rest of the sections all begin with % and the section name. You will see section contents indented by convention, but this is not required.","title":"Header"},{"location":"clusters-at-yale/guides/containers/#labels","text":"The labels section allows you to define metadata for your container: %labels Name Maintainer \"YCRC Support Team\" Version v99.9 Architecture x86_64 URL https://research.computing.yale.edu/ You can examine container metadata with the apptainer inspect command.","title":"%labels"},{"location":"clusters-at-yale/guides/containers/#files","text":"If you'd like to copy any files from the system you are building on, you do so in the %files section. Each line in the files section is a pair of source and destination paths, where the source is on your host system, and destination is a path in the container. %files sample_data.tar /opt/sample_data/ example_script.sh /opt/sample_data/","title":"%files"},{"location":"clusters-at-yale/guides/containers/#post","text":"The post section is where you can run updates, installs, etc in your container to customize it. %post echo \"Customizing Ubuntu\" apt-get update apt-get -y install software-properties-common build-essential cmake add-apt-repository universe apt-get update apt-get -y libboost-all-dev libgl1-mesa-dev libglu1-mesa-dev cd /tmp git clone https://github.com/gitdudette/myapp && cd myapp # ... etc etc","title":"%post"},{"location":"clusters-at-yale/guides/containers/#environment","text":"The environment section allows you to define environment variables for your container. These variables are available when you run the built container, not during its build. %environment export PATH = /opt/my_app/bin: $PATH export LD_LIBRARY_PATH = /opt/my_app/lib: $LD_LIBRARY_PATH","title":"%environment"},{"location":"clusters-at-yale/guides/containers/#building","text":"To finally build your container after saving your definition file as my_app.def , for example, you would run apptainer build my_app.sif my_app.def","title":"Building"},{"location":"clusters-at-yale/guides/containers/#use-a-container-image","text":"Once you have a container image, you can run it as a part of a batch job, or interactively.","title":"Use a Container Image"},{"location":"clusters-at-yale/guides/containers/#interactively","text":"To get a shell in a container so you can interactively work in its environment: apptainer shell --shell /bin/bash containername.sif","title":"Interactively"},{"location":"clusters-at-yale/guides/containers/#in-a-job-script","text":"You can also run applications from your container non-interactively as you would in a batch job. If I wanted to run a script called my_script.py using my container's python: apptainer exec containername.sif python my_script.py","title":"In a Job Script"},{"location":"clusters-at-yale/guides/containers/#environment-variables","text":"If you are unsure if you are running inside or outside your container, you can run: echo $APPTAINER_NAME If you get back text, you are in your container. If you'd like to pass environment variables into your container, you can do so by defining them prefixed with APPTAINERENV_ . For Example: export APPTAINERENV_BLASTDB = /home/me/db/blast apptainer exec my_blast_image.sif env | grep BLAST Should return BLASTDB=/home/me/db/blast , which means you set the BLASTDB environment variable in the container properly.","title":"Environment Variables"},{"location":"clusters-at-yale/guides/containers/#additional-notes","text":"","title":"Additional Notes"},{"location":"clusters-at-yale/guides/containers/#mpi","text":"MPI support for Apptainer is relatively straight-forward. The only thing to watch is to make sure that you are using the same version of MPI inside your container as you are on the cluster.","title":"MPI"},{"location":"clusters-at-yale/guides/containers/#gpus","text":"You can use GPU-accelerated code inside your container, which will need most everything also installed in your container (e.g. CUDA, cuDNN). In order for your applications to have access to the right drivers on the host machine, use the --nv flag. For example: apptainer exec --nv tensorflow-10.0-py3.sif python ./my-tf-model.py","title":"GPUs"},{"location":"clusters-at-yale/guides/containers/#home-directories","text":"Sometimes the maintainer of a Docker container you are trying to use installed software into a special user's home directory. If you need access to someone's home directory that exists in the container and not on the host, you should add the --contain option. Unfortunately, you will also then have to explicitly tell Apptainer about the paths that you want to use from inside the container with the --bind option. apptainer shell --shell /bin/bash --contain --bind /gpfs/gibbs/project/support/be59:/home/be59/project bioconvert-latest.sif","title":"Home Directories"},{"location":"clusters-at-yale/guides/cryoem/","text":"Cryogenic Electron Microscopy (Cryo-EM) Data Processing on McCleary Below is a work in progress collection of general hints, tips and tricks for running your work on McCleary . As always, if anything below is unclear or could use updating, please let us know during office hours, via email or through our web ticketing system . Storage Be wary of you and your group's storage quotas. Run getquota from time to time to make sure there isn't usage you aren't expecting. We strongly recommend that you archive raw data off-cluster, as only home directories are backed up . Let us know if you need extra space and we can work with you to find a solution that is right for your project and your group. On most GPU nodes there is a fast SSD mounted at /tmp . You can use this as a fast local cache if your program can take advantage of it. Schedule Jobs Many Cryo-EM applications can make use of GPUs as co-processors. In order to use a GPU on McCleary you must allocate a job on a partition with GPUs available and explicitly request GPU(s). Make sure to familiarize yourself with our documentation on scheduling jobs and requesting specific resources . In addition to public partitions that give you access to GPUs, there are pi_cryoem and pi_tomo partitions which are limited to users of the Cryo-EM resources on campus. Please coordinate with the staff from West Campus and CCMI ( See here for contact info ) for access. Software Many Cryo-EM applications are meant to be viewed and interacted with in real-time. This mode of working is not ideal for the way most HPC clusters are set up, so where possible try to prototype a job you would like to run with a smaller dataset or subset of your data. Then develop a script to submit with sbatch . RELION The RELION pipeline operates in two modes. You can use it as a more familiar and beginner-friendly graphical interface, or call the programs involved directly. Once you are comfortable, using the commands directly in scripts submitted with sbatch will allow you to get the most work done the fastest. The authors provide up-to-date hints about performance on their Benchmarks page. If you need technical help (jobs submit fine but having other issues) you should search and submit to their mailing list . Module We have GPU-enabled versions of RELION available on McCleary as software modules . To check witch versions are available, run module avail relion . To see specific notes about a particular install, you can use module help , e.g. module help RELION/4.0.0-fosscuda-2020b . Example Job Parameters RELION reserves one worker (slurm task) for orchestrating an MPI-based job, which they call the \"master\". This can lead to inefficient jobs where there are tasks that could be using a GPU but are stuck being the master process. You can request a better layout for your job with a heterogenous job , allocating CPUs on a cpu-only compute node for the task that will not use GPUs. Here is an example 3D refinement job submission script (replace choose_a_version with the version you want to use): #!/bin/bash #SBATCH --partition=general --ntasks 1 -c2 --job-name=class3D_hetero_01 --mem=10G --output=\"class3D_hetero_01-%j.out\" #SBATCH hetjob #SBATCH --partition=gpu --ntasks 4 -c2 -N1 --mem-per-cpu=16G --gpus-per-task=1 module load RELION/choose_a_version srun --pack-group = 0 ,1 relion_refine_mpi --o hetero/refine3D/job0001 ... --dont_combine_weights_via_disc --j ${ SLURM_CPUS_PER_TASK } --gpu This job submission request will result in RELION using a single task/worker on a general purpose CPU node, and efficiently find four GPUs even if they aren't all available on the same compute node. Each GPU node task/worker will have a dedicated GPU, two CPU cores, and 30GiB total memory. EMAN2 EMAN2 has always been a bit of a struggle for us to install properly on the clusters. Below are a few options Conda Install The EMAN2 authors offer some instructions on how to get EMAN2 running in a cluster environment on their install page . The default install may work as well if you avoid using MPI. Container At present, we have a mostly working apptainer container for EMAN2.3 available here: /gpfs/ysm/datasets/cryoem/eman2.3_ubuntu18.04.sif To run a program from EMAN2 using this container you would use a command like: apptainer exec /gpfs/ysm/datasets/cryoem/eman2.3_ubuntu18.04.sif e2projectmanager.py Cryosparc We have a whole separate page about this one, it is a bit involved. Other Software We have CCP4, Phenix and some other software modules of interest installed. Run module avail and the software name to search for them. If you can't find one you need, please contact us .","title":"Cryo-EM on McCleary"},{"location":"clusters-at-yale/guides/cryoem/#cryogenic-electron-microscopy-cryo-em-data-processing-on-mccleary","text":"Below is a work in progress collection of general hints, tips and tricks for running your work on McCleary . As always, if anything below is unclear or could use updating, please let us know during office hours, via email or through our web ticketing system .","title":"Cryogenic Electron Microscopy (Cryo-EM) Data Processing on McCleary"},{"location":"clusters-at-yale/guides/cryoem/#storage","text":"Be wary of you and your group's storage quotas. Run getquota from time to time to make sure there isn't usage you aren't expecting. We strongly recommend that you archive raw data off-cluster, as only home directories are backed up . Let us know if you need extra space and we can work with you to find a solution that is right for your project and your group. On most GPU nodes there is a fast SSD mounted at /tmp . You can use this as a fast local cache if your program can take advantage of it.","title":"Storage"},{"location":"clusters-at-yale/guides/cryoem/#schedule-jobs","text":"Many Cryo-EM applications can make use of GPUs as co-processors. In order to use a GPU on McCleary you must allocate a job on a partition with GPUs available and explicitly request GPU(s). Make sure to familiarize yourself with our documentation on scheduling jobs and requesting specific resources . In addition to public partitions that give you access to GPUs, there are pi_cryoem and pi_tomo partitions which are limited to users of the Cryo-EM resources on campus. Please coordinate with the staff from West Campus and CCMI ( See here for contact info ) for access.","title":"Schedule Jobs"},{"location":"clusters-at-yale/guides/cryoem/#software","text":"Many Cryo-EM applications are meant to be viewed and interacted with in real-time. This mode of working is not ideal for the way most HPC clusters are set up, so where possible try to prototype a job you would like to run with a smaller dataset or subset of your data. Then develop a script to submit with sbatch .","title":"Software"},{"location":"clusters-at-yale/guides/cryoem/#relion","text":"The RELION pipeline operates in two modes. You can use it as a more familiar and beginner-friendly graphical interface, or call the programs involved directly. Once you are comfortable, using the commands directly in scripts submitted with sbatch will allow you to get the most work done the fastest. The authors provide up-to-date hints about performance on their Benchmarks page. If you need technical help (jobs submit fine but having other issues) you should search and submit to their mailing list .","title":"RELION"},{"location":"clusters-at-yale/guides/cryoem/#module","text":"We have GPU-enabled versions of RELION available on McCleary as software modules . To check witch versions are available, run module avail relion . To see specific notes about a particular install, you can use module help , e.g. module help RELION/4.0.0-fosscuda-2020b .","title":"Module"},{"location":"clusters-at-yale/guides/cryoem/#example-job-parameters","text":"RELION reserves one worker (slurm task) for orchestrating an MPI-based job, which they call the \"master\". This can lead to inefficient jobs where there are tasks that could be using a GPU but are stuck being the master process. You can request a better layout for your job with a heterogenous job , allocating CPUs on a cpu-only compute node for the task that will not use GPUs. Here is an example 3D refinement job submission script (replace choose_a_version with the version you want to use): #!/bin/bash #SBATCH --partition=general --ntasks 1 -c2 --job-name=class3D_hetero_01 --mem=10G --output=\"class3D_hetero_01-%j.out\" #SBATCH hetjob #SBATCH --partition=gpu --ntasks 4 -c2 -N1 --mem-per-cpu=16G --gpus-per-task=1 module load RELION/choose_a_version srun --pack-group = 0 ,1 relion_refine_mpi --o hetero/refine3D/job0001 ... --dont_combine_weights_via_disc --j ${ SLURM_CPUS_PER_TASK } --gpu This job submission request will result in RELION using a single task/worker on a general purpose CPU node, and efficiently find four GPUs even if they aren't all available on the same compute node. Each GPU node task/worker will have a dedicated GPU, two CPU cores, and 30GiB total memory.","title":"Example Job Parameters"},{"location":"clusters-at-yale/guides/cryoem/#eman2","text":"EMAN2 has always been a bit of a struggle for us to install properly on the clusters. Below are a few options","title":"EMAN2"},{"location":"clusters-at-yale/guides/cryoem/#conda-install","text":"The EMAN2 authors offer some instructions on how to get EMAN2 running in a cluster environment on their install page . The default install may work as well if you avoid using MPI.","title":"Conda Install"},{"location":"clusters-at-yale/guides/cryoem/#container","text":"At present, we have a mostly working apptainer container for EMAN2.3 available here: /gpfs/ysm/datasets/cryoem/eman2.3_ubuntu18.04.sif To run a program from EMAN2 using this container you would use a command like: apptainer exec /gpfs/ysm/datasets/cryoem/eman2.3_ubuntu18.04.sif e2projectmanager.py","title":"Container"},{"location":"clusters-at-yale/guides/cryoem/#cryosparc","text":"We have a whole separate page about this one, it is a bit involved.","title":"Cryosparc"},{"location":"clusters-at-yale/guides/cryoem/#other-software","text":"We have CCP4, Phenix and some other software modules of interest installed. Run module avail and the software name to search for them. If you can't find one you need, please contact us .","title":"Other Software"},{"location":"clusters-at-yale/guides/cryosparc/","text":"cryoSPARCv2 on Farnam Getting cryoSPARC set up and running on the YCRC clusters is something of a task. This guide is meant for intermediate/advanced users. If enought people can convince Structura bio ( see ticket here ) to make cryoSPARC more cluster-friendly we could have a single instance running that you'd just log in to with your Yale credentials. Until then, venture below at your own peril. Install Before you get started, you will need to request a licence from Structura from their website . These instructions are gently modified from the official cryoSPARC documentation . 1. Set up Environment First allocate an interactive job on a compute node to run the install on. salloc --cpus-per-task 2 Then, set the following environment variables to suit your install. We filled in some defaults for you. # where to install cryosparc2 and its sample database install_path = $( readlink -f ${ HOME } /project ) /software/cryosparc2 # the license ID you got from Structura license_id = # your email my_email = $( head -n1 ~/.forward ) # slurm partition to submit your cryosparc jobs to # not sure you can change at runtime? partition = gpu 2. Set up Directories, Download installers # your username my_name = ${ USER } # a temp password cryosparc_passwd = Password123 # load the right CUDA module load CUDA/9.0.176 # set up some more paths db_path = ${ install_path } /database worker_path = ${ install_path } /cryosparc2_worker ssd_path = /tmp/ ${ USER } /cryosparc2_cache # go get the installers mkdir -p $install_path cd $install_path curl -sL https://get.cryosparc.com/download/master-latest/ $license_id > cryosparc2_master.tar.gz curl -sL https://get.cryosparc.com/download/worker-latest/ $license_id > cryosparc2_worker.tar.gz tar -xf cryosparc2_master.tar.gz tar -xf cryosparc2_worker.tar.gz 3. Install the Server and Worker cd ${ install_path } /cryosparc2_master ./install.sh --license $license_id --hostname $( hostname ) --dbpath $db_path --yes source ~/.bashrc cd ${ install_path } /cryosparc2_worker ./install.sh --license $license_id --cudapath $CUDA_HOME --yes source ~/.bashrc 4. Configure for Farnam # Farnam cluster setup mkdir -p ${ install_path } /site_configs && cd ${ install_path } /site_configs cat << EOF > cluster_info.json { \"name\" : \"farnam\", \"worker_bin_path\" : \"${install_path}/cryosparc2_worker/bin/cryosparcw\", \"cache_path\" : \"/tmp/{{ cryosparc_username }}/cryosparc_cache\", \"send_cmd_tpl\" : \"{{ command }}\", \"qsub_cmd_tpl\" : \"sbatch {{ script_path_abs }}\", \"qstat_cmd_tpl\" : \"squeue -j {{ cluster_job_id }}\", \"qdel_cmd_tpl\" : \"scancel {{ cluster_job_id }}\", \"qinfo_cmd_tpl\" : \"sinfo\" } EOF cat << EOF > cluster_script.sh #!/usr/bin/env bash #SBATCH --job-name cryosparc_{{ project_uid }}_{{ job_uid }} #SBATCH -c {{ num_cpu }} #SBATCH --gpus={{ num_gpu }} #SBATCH -p ${partition} #SBATCH --mem={{ (ram_gb*1024)|int }} #SBATCH -o {{ job_dir_abs }} #SBATCH -e {{ job_dir_abs }} module load CUDA/9.0.176 mkdir -p /tmp/${USER}/cryosparc2_cache {{ run_cmd }} EOF Run salloc --cpus-per-task 2 master_host = $( hostname ) base_dir = $( dirname \" $( dirname \" $( which cryosparcm ) \" ) \" ) sed -i.bak 's/export CRYOSPARC_MASTER_HOSTNAME.*$/export CRYOSPARC_MASTER_HOSTNAME=\\\"' \" $master_host \" '\\\"/g' $base_dir /config.sh source $base_dir /config.sh cryosparcm start cryosparcm status # run the output from the following command on your local linux/mac machine echo \"ssh -N -L $CRYOSPARC_BASE_PORT : $master_host : $CRYOSPARC_BASE_PORT $USER @mccleary.ycrc.yale.edu\" Database errors If your database won't start and you're sure there isn't another server running, you can remove lock files and try again. # rm -f $CRYOSPARC_DB_PATH/WiredTiger.lock $CRYOSPARC_DB_PATH/mongod.lock","title":"cryoSPARCv2 on Farnam"},{"location":"clusters-at-yale/guides/cryosparc/#cryosparcv2-on-farnam","text":"Getting cryoSPARC set up and running on the YCRC clusters is something of a task. This guide is meant for intermediate/advanced users. If enought people can convince Structura bio ( see ticket here ) to make cryoSPARC more cluster-friendly we could have a single instance running that you'd just log in to with your Yale credentials. Until then, venture below at your own peril.","title":"cryoSPARCv2 on Farnam"},{"location":"clusters-at-yale/guides/cryosparc/#install","text":"Before you get started, you will need to request a licence from Structura from their website . These instructions are gently modified from the official cryoSPARC documentation .","title":"Install"},{"location":"clusters-at-yale/guides/cryosparc/#1-set-up-environment","text":"First allocate an interactive job on a compute node to run the install on. salloc --cpus-per-task 2 Then, set the following environment variables to suit your install. We filled in some defaults for you. # where to install cryosparc2 and its sample database install_path = $( readlink -f ${ HOME } /project ) /software/cryosparc2 # the license ID you got from Structura license_id = # your email my_email = $( head -n1 ~/.forward ) # slurm partition to submit your cryosparc jobs to # not sure you can change at runtime? partition = gpu","title":"1. Set up Environment"},{"location":"clusters-at-yale/guides/cryosparc/#2-set-up-directories-download-installers","text":"# your username my_name = ${ USER } # a temp password cryosparc_passwd = Password123 # load the right CUDA module load CUDA/9.0.176 # set up some more paths db_path = ${ install_path } /database worker_path = ${ install_path } /cryosparc2_worker ssd_path = /tmp/ ${ USER } /cryosparc2_cache # go get the installers mkdir -p $install_path cd $install_path curl -sL https://get.cryosparc.com/download/master-latest/ $license_id > cryosparc2_master.tar.gz curl -sL https://get.cryosparc.com/download/worker-latest/ $license_id > cryosparc2_worker.tar.gz tar -xf cryosparc2_master.tar.gz tar -xf cryosparc2_worker.tar.gz","title":"2. Set up Directories, Download installers"},{"location":"clusters-at-yale/guides/cryosparc/#3-install-the-server-and-worker","text":"cd ${ install_path } /cryosparc2_master ./install.sh --license $license_id --hostname $( hostname ) --dbpath $db_path --yes source ~/.bashrc cd ${ install_path } /cryosparc2_worker ./install.sh --license $license_id --cudapath $CUDA_HOME --yes source ~/.bashrc","title":"3. Install the Server and Worker"},{"location":"clusters-at-yale/guides/cryosparc/#4-configure-for-farnam","text":"# Farnam cluster setup mkdir -p ${ install_path } /site_configs && cd ${ install_path } /site_configs cat << EOF > cluster_info.json { \"name\" : \"farnam\", \"worker_bin_path\" : \"${install_path}/cryosparc2_worker/bin/cryosparcw\", \"cache_path\" : \"/tmp/{{ cryosparc_username }}/cryosparc_cache\", \"send_cmd_tpl\" : \"{{ command }}\", \"qsub_cmd_tpl\" : \"sbatch {{ script_path_abs }}\", \"qstat_cmd_tpl\" : \"squeue -j {{ cluster_job_id }}\", \"qdel_cmd_tpl\" : \"scancel {{ cluster_job_id }}\", \"qinfo_cmd_tpl\" : \"sinfo\" } EOF cat << EOF > cluster_script.sh #!/usr/bin/env bash #SBATCH --job-name cryosparc_{{ project_uid }}_{{ job_uid }} #SBATCH -c {{ num_cpu }} #SBATCH --gpus={{ num_gpu }} #SBATCH -p ${partition} #SBATCH --mem={{ (ram_gb*1024)|int }} #SBATCH -o {{ job_dir_abs }} #SBATCH -e {{ job_dir_abs }} module load CUDA/9.0.176 mkdir -p /tmp/${USER}/cryosparc2_cache {{ run_cmd }} EOF","title":"4. Configure for Farnam"},{"location":"clusters-at-yale/guides/cryosparc/#run","text":"salloc --cpus-per-task 2 master_host = $( hostname ) base_dir = $( dirname \" $( dirname \" $( which cryosparcm ) \" ) \" ) sed -i.bak 's/export CRYOSPARC_MASTER_HOSTNAME.*$/export CRYOSPARC_MASTER_HOSTNAME=\\\"' \" $master_host \" '\\\"/g' $base_dir /config.sh source $base_dir /config.sh cryosparcm start cryosparcm status # run the output from the following command on your local linux/mac machine echo \"ssh -N -L $CRYOSPARC_BASE_PORT : $master_host : $CRYOSPARC_BASE_PORT $USER @mccleary.ycrc.yale.edu\"","title":"Run"},{"location":"clusters-at-yale/guides/cryosparc/#database-errors","text":"If your database won't start and you're sure there isn't another server running, you can remove lock files and try again. # rm -f $CRYOSPARC_DB_PATH/WiredTiger.lock $CRYOSPARC_DB_PATH/mongod.lock","title":"Database errors"},{"location":"clusters-at-yale/guides/gaussian/","text":"Gaussian Note Access to Gaussian on the Yale clusters is free, but available by request only. To gain access to the installations of Gaussian, please contact us to be added to the gaussian group. Gaussian is an electronic structure modeling program that Yale has licensed for its HPC clusters. The latest version of Gaussian is Gaussian 16, which also includes GaussView 6. Older versions of both applications are also available. To see a full list of available versions of Gaussian on the cluster, run: module avail gaussian Running Gaussian on the Cluster The examples here are for Gaussian 16. In most cases, you could run the older version Gaussian 09 by replacing \"g16\" with \"g09\" wherever it occurs. When running Gaussian, it is recommended that users request exclusive access to allocated nodes (e.g., by requesting all the cpus on the node) and that they specify the largest possible memory allocation for the number of nodes requested. In addition, in most cases, the scratch storage location (set by the environment variable GAUSS_SCRDIR ) should be on the local parallel scratch file system (e.g., scratch60) of the cluster, rather than in the user\u2019s home directory. (This is the default in the Gaussian module files.) Before running Gaussian, you must set up a number of environment variables. This is accomplished most easily by loading the Gaussian module file using: module load Gaussian To run Gaussian interactively, you need to create an interactive session on a compute node. You could start an interactive session using 4 cores for 4 hours using salloc -c 4 -t 4 :00:00 See our Slurm documentation for more detailed information on requesting resources for interactive jobs. GaussView In connection with Gaussian 16, we have also installed GaussView 6, Gaussian Inc.'s most advanced and powerful graphical interface for Gaussian. With GaussView, you can import or build the molecular structures that interest you; set up, launch, monitor and control Gaussian calculations; and retrieve and view the results, all without ever leaving the application. GaussView 6 includes many new features designed to make working with large systems of chemical interest convenient and straightforward. It also provides full support for all of the new modeling methods and features in Gaussian 16. In order to use GaussView, you must run an X Server on your desktop or laptop, and you must enable X forwarding when logging into the cluster. See our X11 forwarding documentation for instructions. Loading the module file for Gaussian sets up your environment for GaussView as well. Then you can start GaussView by typing the command gv . GaussView 6 may not be compatible with certain versions of the X servers you may run on your desktop or laptop. If you encounter problems, these can often be overcome by starting GaussView with the command gv -mesagl or gv -soft .","title":"Gaussian"},{"location":"clusters-at-yale/guides/gaussian/#gaussian","text":"Note Access to Gaussian on the Yale clusters is free, but available by request only. To gain access to the installations of Gaussian, please contact us to be added to the gaussian group. Gaussian is an electronic structure modeling program that Yale has licensed for its HPC clusters. The latest version of Gaussian is Gaussian 16, which also includes GaussView 6. Older versions of both applications are also available. To see a full list of available versions of Gaussian on the cluster, run: module avail gaussian","title":"Gaussian"},{"location":"clusters-at-yale/guides/gaussian/#running-gaussian-on-the-cluster","text":"The examples here are for Gaussian 16. In most cases, you could run the older version Gaussian 09 by replacing \"g16\" with \"g09\" wherever it occurs. When running Gaussian, it is recommended that users request exclusive access to allocated nodes (e.g., by requesting all the cpus on the node) and that they specify the largest possible memory allocation for the number of nodes requested. In addition, in most cases, the scratch storage location (set by the environment variable GAUSS_SCRDIR ) should be on the local parallel scratch file system (e.g., scratch60) of the cluster, rather than in the user\u2019s home directory. (This is the default in the Gaussian module files.) Before running Gaussian, you must set up a number of environment variables. This is accomplished most easily by loading the Gaussian module file using: module load Gaussian To run Gaussian interactively, you need to create an interactive session on a compute node. You could start an interactive session using 4 cores for 4 hours using salloc -c 4 -t 4 :00:00 See our Slurm documentation for more detailed information on requesting resources for interactive jobs.","title":"Running Gaussian on the Cluster"},{"location":"clusters-at-yale/guides/gaussian/#gaussview","text":"In connection with Gaussian 16, we have also installed GaussView 6, Gaussian Inc.'s most advanced and powerful graphical interface for Gaussian. With GaussView, you can import or build the molecular structures that interest you; set up, launch, monitor and control Gaussian calculations; and retrieve and view the results, all without ever leaving the application. GaussView 6 includes many new features designed to make working with large systems of chemical interest convenient and straightforward. It also provides full support for all of the new modeling methods and features in Gaussian 16. In order to use GaussView, you must run an X Server on your desktop or laptop, and you must enable X forwarding when logging into the cluster. See our X11 forwarding documentation for instructions. Loading the module file for Gaussian sets up your environment for GaussView as well. Then you can start GaussView by typing the command gv . GaussView 6 may not be compatible with certain versions of the X servers you may run on your desktop or laptop. If you encounter problems, these can often be overcome by starting GaussView with the command gv -mesagl or gv -soft .","title":"GaussView"},{"location":"clusters-at-yale/guides/github/","text":"Version control with Git and GitHub What is version control? Version contol is an easy and powerful way to track changes to your work. This extends from code to writing documents (if using LaTeX/Tex). It produces and saves \"tagged\" copies of your project so that you don't need to worry about breaking your code-base. This provides a \"coding safety net\" to let you try new features while retaining the ability to roll-back to a working version. Whether developing large frameworks or simply working on small scripts, version control is an important tool to ensure that your work is never lost. We recommend using git for its flexibility and versatility and GitHub for its power in enabling research and collaboration. 1 Here we will cover the basics of version control and how to use git and GitHub. What is git and how does it work? Git is a tool that tracks changes to a file (or set of files) through a series of snapshots called \"commits\" or \"revisions\". These snapshots are stored in \"repositories\" which contain the history of all the changes to that file. This helps prevent repetative naming or project_final_final2_v3.txt problems. It acts as a record of all the edits, along with the ability to compare the current version to previous commits. How to create a git repository You can create a repository at any time by running the following commands: cd /path/to/your/project # initialize the repository git init # add files to be tracked git add main.py input.txt # commit the files to the repository, creating the first snapshot git commit -m \"Initial Commit\" This sets up a repository containing a single snapshot of the project's two files. We can then edit these files and commit the changes into a new snapshot: # edit files echo \"changed this file\" >> input.txt $ git status On branch main Changes not staged for commit: ( use \"git add ...\" to update what will be committed ) ( use \"git checkout -- ...\" to discard changes in working directory ) modified: input.txt no changes added to commit ( use \"git add\" and/or \"git commit -a\" ) Finally, we can stage input.txt and then commit the changes: # stage changes for commit git add input.txt git commit -m \"modified input file\" Configuring git It's very helpful to configure your email and username with git : git config --global user.name \"Your Name\" git config --global user.email \"your.email@yale.edu\" This will then tag your changes with your name and email when collaborating with people on a larger project. Working with remote repositories on GitHub We recommend using an off-site repository like GitHub that provides a secure and co-located backup of your local repositories. To start, create a repository on GitHub by going to https://github.com/new and providing a name and choose either public or private access. Then you can connect your local repository to the GitHub repo (named my_new_repo ): git remote add origin git@github.com:user_name/my_new_repo.git git push -u origin main Alternatively, a repository can be created on GitHub and then cloned to your local machine: $ git clone git@github.com:user_name/my_new_repo.git Cloning into 'my_new_repo' ... remote: Enumerating objects: 3 , done . remote: Counting objects: 100 % ( 3 /3 ) , done . remote: Total 3 ( delta 0 ) , reused 0 ( delta 0 ) , pack-reused 0 Receiving objects: 100 % ( 3 /3 ) , done . This creates a new directory ( my_new_repo ) where you can place all your code. After making any changes and commiting them to the local repository, you can \"push\" them to a remote repository: # commit to local repository git commit -m \"new changes\" # push commits to remote repository on GitHub git push Educational GitHub All students and research staff are able to request free Educational discounts from GitHub. This provides a \"Pro\" account for free, including unlimited private repositories. To get started, create a free GitHub account with your Yale email address. Then go to https://education.github.com and request the educational discount. It normally takes less than 24 hours for them to grant the discount. Educational discounts are also available for teams and collaborations. This is perfect for a research group or collaboration and can include non-Yale affiliated people. Resources and links YCRC Version Control Bootcamp Educational GitHub GitHub's Try-it Instruqt Getting Started With Git We do not recommend the use of https://git.yale.edu , which is an internal-only tool not designed for research use. \u21a9","title":"GitHub"},{"location":"clusters-at-yale/guides/github/#version-control-with-git-and-github","text":"","title":"Version control with Git and GitHub"},{"location":"clusters-at-yale/guides/github/#what-is-version-control","text":"Version contol is an easy and powerful way to track changes to your work. This extends from code to writing documents (if using LaTeX/Tex). It produces and saves \"tagged\" copies of your project so that you don't need to worry about breaking your code-base. This provides a \"coding safety net\" to let you try new features while retaining the ability to roll-back to a working version. Whether developing large frameworks or simply working on small scripts, version control is an important tool to ensure that your work is never lost. We recommend using git for its flexibility and versatility and GitHub for its power in enabling research and collaboration. 1 Here we will cover the basics of version control and how to use git and GitHub.","title":"What is version control?"},{"location":"clusters-at-yale/guides/github/#what-is-git-and-how-does-it-work","text":"Git is a tool that tracks changes to a file (or set of files) through a series of snapshots called \"commits\" or \"revisions\". These snapshots are stored in \"repositories\" which contain the history of all the changes to that file. This helps prevent repetative naming or project_final_final2_v3.txt problems. It acts as a record of all the edits, along with the ability to compare the current version to previous commits.","title":"What is git and how does it work?"},{"location":"clusters-at-yale/guides/github/#how-to-create-a-git-repository","text":"You can create a repository at any time by running the following commands: cd /path/to/your/project # initialize the repository git init # add files to be tracked git add main.py input.txt # commit the files to the repository, creating the first snapshot git commit -m \"Initial Commit\" This sets up a repository containing a single snapshot of the project's two files. We can then edit these files and commit the changes into a new snapshot: # edit files echo \"changed this file\" >> input.txt $ git status On branch main Changes not staged for commit: ( use \"git add ...\" to update what will be committed ) ( use \"git checkout -- ...\" to discard changes in working directory ) modified: input.txt no changes added to commit ( use \"git add\" and/or \"git commit -a\" ) Finally, we can stage input.txt and then commit the changes: # stage changes for commit git add input.txt git commit -m \"modified input file\"","title":"How to create a git repository"},{"location":"clusters-at-yale/guides/github/#configuring-git","text":"It's very helpful to configure your email and username with git : git config --global user.name \"Your Name\" git config --global user.email \"your.email@yale.edu\" This will then tag your changes with your name and email when collaborating with people on a larger project.","title":"Configuring git"},{"location":"clusters-at-yale/guides/github/#working-with-remote-repositories-on-github","text":"We recommend using an off-site repository like GitHub that provides a secure and co-located backup of your local repositories. To start, create a repository on GitHub by going to https://github.com/new and providing a name and choose either public or private access. Then you can connect your local repository to the GitHub repo (named my_new_repo ): git remote add origin git@github.com:user_name/my_new_repo.git git push -u origin main Alternatively, a repository can be created on GitHub and then cloned to your local machine: $ git clone git@github.com:user_name/my_new_repo.git Cloning into 'my_new_repo' ... remote: Enumerating objects: 3 , done . remote: Counting objects: 100 % ( 3 /3 ) , done . remote: Total 3 ( delta 0 ) , reused 0 ( delta 0 ) , pack-reused 0 Receiving objects: 100 % ( 3 /3 ) , done . This creates a new directory ( my_new_repo ) where you can place all your code. After making any changes and commiting them to the local repository, you can \"push\" them to a remote repository: # commit to local repository git commit -m \"new changes\" # push commits to remote repository on GitHub git push","title":"Working with remote repositories on GitHub"},{"location":"clusters-at-yale/guides/github/#educational-github","text":"All students and research staff are able to request free Educational discounts from GitHub. This provides a \"Pro\" account for free, including unlimited private repositories. To get started, create a free GitHub account with your Yale email address. Then go to https://education.github.com and request the educational discount. It normally takes less than 24 hours for them to grant the discount. Educational discounts are also available for teams and collaborations. This is perfect for a research group or collaboration and can include non-Yale affiliated people.","title":"Educational GitHub"},{"location":"clusters-at-yale/guides/github/#resources-and-links","text":"YCRC Version Control Bootcamp Educational GitHub GitHub's Try-it Instruqt Getting Started With Git We do not recommend the use of https://git.yale.edu , which is an internal-only tool not designed for research use. \u21a9","title":"Resources and links"},{"location":"clusters-at-yale/guides/github_pages/","text":"GitHub Pages Personal Website A personal website is a great way to build an online presence for both academic and professional activities. We recommend using GitHub Pages as a tool to maintain and host static websites and blogs. Unlike other hosting platforms, the whole website can be written using Markdown , a simple widely-used markup language. GitHub provides a tutorial to get started with Markdown ( link ). To get started, you're going to need a GitHub account. You can follow the instructions on our GitHub guide to set up a free account. Once you have an account, you will need to create a repository for your website. It's important that you name your repository username.github.io where username is replaced with your actual account name ( ycrc-test in this example). Make sure to initialize the repo with a README, which will help get things started. After clicking \"Create\" your repository will look like this: From here, you can click on \"Settings\" to enable GitHub Pages publication of your site. Scroll down until you see GitHub Pages : GitHub provides a number of templates to help make your website look professional. Click on \"Choose a Theme\" to see examples of these themes: Pick one that you like and click \"Select theme\". Note, some of these themes are aimed at blogs versus project sites, pick one that best fits your desired style. You can change this later, so feel free to try one out and see what you think. After selecting your theme, you will be directed back to your repository where the README.md has been updated with some basics about how Markdown works and how you can start creating your website. Scroll down and commit these changes (leaving the sample text in place). You can now take a look at how GitHub is rendering your site: That's it, this site is now hosted at ycrc-test.github.io ! You now have a simple-to-edit and customize site that can be used to host your CV, detail your academic research, or showcase your independent projects. Project website In addition to hosting a stand-alone website, GitHub Pages can be used to create pages for specific projects or repositories. Here we will take an existing repository amazing-python-project and add a GitHub Pages website on a new branch. Click on the Branch pull-down and create a new branch titled gh-pages : Remove any files from that branch and create a new file called index.md : Add content to the page using Markdown syntax: To customize the site, click on Settings and then scroll down to GitHub Pages : Click on the Theme Chooser and select your favorite style: Finally, you can navigate to your website and see it live! Conclusions We have detailed two ways to add static websites to your work, either as a professional webpage or a project-specific site. This can help increase your works impact and give you a platform to showcase your work. Further Reading Jekyll : the tool that powers GitHub Pages GitHub Learning Lab Academic Pages : forkable template for academic websites Jekyll Academic Example GitHub Pages Websites GitHub and Government , https://github.com/github/government.github.com ElectronJS , https://github.com/electron/electronjs.org Twitter GitHub , https://github.com/twitter/twitter.github.io React , https://github.com/facebook/react","title":"GitHub Pages"},{"location":"clusters-at-yale/guides/github_pages/#github-pages","text":"","title":"GitHub Pages"},{"location":"clusters-at-yale/guides/github_pages/#personal-website","text":"A personal website is a great way to build an online presence for both academic and professional activities. We recommend using GitHub Pages as a tool to maintain and host static websites and blogs. Unlike other hosting platforms, the whole website can be written using Markdown , a simple widely-used markup language. GitHub provides a tutorial to get started with Markdown ( link ). To get started, you're going to need a GitHub account. You can follow the instructions on our GitHub guide to set up a free account. Once you have an account, you will need to create a repository for your website. It's important that you name your repository username.github.io where username is replaced with your actual account name ( ycrc-test in this example). Make sure to initialize the repo with a README, which will help get things started. After clicking \"Create\" your repository will look like this: From here, you can click on \"Settings\" to enable GitHub Pages publication of your site. Scroll down until you see GitHub Pages : GitHub provides a number of templates to help make your website look professional. Click on \"Choose a Theme\" to see examples of these themes: Pick one that you like and click \"Select theme\". Note, some of these themes are aimed at blogs versus project sites, pick one that best fits your desired style. You can change this later, so feel free to try one out and see what you think. After selecting your theme, you will be directed back to your repository where the README.md has been updated with some basics about how Markdown works and how you can start creating your website. Scroll down and commit these changes (leaving the sample text in place). You can now take a look at how GitHub is rendering your site: That's it, this site is now hosted at ycrc-test.github.io ! You now have a simple-to-edit and customize site that can be used to host your CV, detail your academic research, or showcase your independent projects.","title":"Personal Website"},{"location":"clusters-at-yale/guides/github_pages/#project-website","text":"In addition to hosting a stand-alone website, GitHub Pages can be used to create pages for specific projects or repositories. Here we will take an existing repository amazing-python-project and add a GitHub Pages website on a new branch. Click on the Branch pull-down and create a new branch titled gh-pages : Remove any files from that branch and create a new file called index.md : Add content to the page using Markdown syntax: To customize the site, click on Settings and then scroll down to GitHub Pages : Click on the Theme Chooser and select your favorite style: Finally, you can navigate to your website and see it live!","title":"Project website"},{"location":"clusters-at-yale/guides/github_pages/#conclusions","text":"We have detailed two ways to add static websites to your work, either as a professional webpage or a project-specific site. This can help increase your works impact and give you a platform to showcase your work.","title":"Conclusions"},{"location":"clusters-at-yale/guides/github_pages/#further-reading","text":"Jekyll : the tool that powers GitHub Pages GitHub Learning Lab Academic Pages : forkable template for academic websites Jekyll Academic","title":"Further Reading"},{"location":"clusters-at-yale/guides/github_pages/#example-github-pages-websites","text":"GitHub and Government , https://github.com/github/government.github.com ElectronJS , https://github.com/electron/electronjs.org Twitter GitHub , https://github.com/twitter/twitter.github.io React , https://github.com/facebook/react","title":"Example GitHub Pages Websites"},{"location":"clusters-at-yale/guides/gpus-cuda/","text":"GPUs and CUDA There are GPUs available for general use on the YCRC clusters. In order to use them, you must request them for your job . See the Grace , McCleary , and Milgram pages for hardware and partition specifics. Please do not use nodes with GPUs unless your application or job can make use of them. Any jobs submitted to a GPU partition without having requested a GPU may be terminated without warning. Monitor Activity and Drivers The CUDA libraries you load will allow you to compile code against them. To run CUDA-enabled code you must also be running on a node with a gpu allocated and a compatible driver installed. The minimum driver versions are listed on this nvidia developer site . You can check the available GPUs, their current usage, installed version of the nvidia drivers, and more with the command nvidia-smi . Either in an interactive job or after connecting to a node running your job with ssh , nvidia-smi output should look something like this: [ user@gpu01 ~ ] $ nvidia-smi +-----------------------------------------------------------------------------+ | NVIDIA-SMI 460 .32.03 Driver Version: 460 .32.03 CUDA Version: 11 .2 | | -------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | =============================== + ====================== + ====================== | | 0 GeForce GTX 108 ... On | 00000000 :02:00.0 Off | N/A | | 23 % 34C P8 9W / 250W | 1MiB / 11178MiB | 0 % Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | | ============================================================================= | | No running processes found | +-----------------------------------------------------------------------------+ Here we see that the node gpu01 is running driver version 460.32.03 and is compatible with CUDA version 11.2. There are no processes using the GPU allocated to this job. Software Cuda, cuDNN, tensorflow, and pytorch availability on cluster We have built certain versions of CUDA, cuDNN, tensorflow, and pytorch on all the clusters YCRC maintains. If one of the versions of these modules aligns with the version needed for your research, then there may be no need to install these programs yourself. To list all the modules available for these programs: module avail cuda/ module avail cudnn/ module avail tensorflow module avail pytorch Tensorflow You can find hints about the correct version of Tensorflow from their tested build configurations . You can also test your install with a simple script that imports Tensorflow (run on a GPU node). If you an ImportError that mentions missing libraries like libcublas.so.9.0 , for example, that means that Tensorflow is probably expecting CUDA v 9.0 but cannot find it. Tensorflow-gpu Tensorflow-gpu is now depreciated for newer versions of CUDA and cuDNN and has been combined with the original tensorflow. Any version of tensorflow 2.* contains gpu capabilities and should be installed instead of attempting to install tensorflow-gpu. Create an Example Tensorflow Environment To create a conda environment with Tensorflow and uses the module CUDA: # load modules, including the system CUDA and cuDNN module load miniconda CUDAcore/11.3.1 cuDNN/8.2.1.32-CUDA-11.3.1 # save module collection for future use module save cuda11 #create environment with required dependencies conda create --name tf-modulecuda python = 3 .11.* numpy pandas matplotlib jupyter -c conda-forge # activate environment conda activate tf-modulecuda # use pip to install tensorflow pip install tensorflow == 2 .12.* The most up to date instructions for creating your own cuda/tensorflow environment can be found here . To create a conda environment with your own versions of Cuda and tensorflow: For tensorflow 2.12+: module load miniconda conda create --name tf-condacuda python numpy pandas matplotlib jupyter cudatoolkit = 11 .8.0 conda activate tf-condacuda pip install nvidia-cudnn-cu11 == 8 .6.0.163 # Store system paths to cuda libraries for gpu communication mkdir -p $CONDA_PREFIX /etc/conda/activate.d echo 'CUDNN_PATH=$(dirname $(python -c \"import nvidia.cudnn;print(nvidia.cudnn.__file__)\"))' >> $CONDA_PREFIX /etc/conda/activate.d/env_vars.sh echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX /etc/conda/activate.d/env_vars.sh #install tensorflow pip install tensorflow == 2 .12.* For tensorflow 2.11.* module load miniconda conda create --name tf-condacuda python numpy pandas matplotlib jupyter cudatoolkit = 11 .3.1 cudnn = 8 .2.1 conda activate tf-condacuda # Store system paths to cuda libraries for gpu communication mkdir -p $CONDA_PREFIX /etc/conda/activate.d echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX /etc/conda/activate.d/env_vars.sh #install tensorflow pip install tensorflow == 2 .11.* Use Your Environment To re-enter your environment you only need the following: module load miniconda conda activate tf-condacuda Or if using the module-installed CUDA: module restore cuda11 conda activate tf-modulecuda PyTorch As with Tensorflow, sometimes the conda-supplied CUDA libraries are sufficient for the version of PyTorch you are installing. If not make sure you have the version of CUDA referenced on the PyTorch site in their install instructions . They also provide instructions on installing previous versions compatible with older versions of CUDA. Following the instructions on their site, create a PyTorch environment using conda : module load miniconda conda create --name pytorch_env pytorch torchvision torchaudio pytorch-cuda = 11 .7 -c pytorch -c nvidia Compile .c or .cpp Files with CUDA code By default, nvcc expects that host code is in files with a .c or .cpp extension, and device code is in files with a .cu extension. When you mix device code in a .c or .cpp file with host code, the device code will not be recoganized by nvcc unless you add this flag: -x cu . nvcc -x cu mycuda.cpp -o mycuda.exe","title":"GPUs and CUDA"},{"location":"clusters-at-yale/guides/gpus-cuda/#gpus-and-cuda","text":"There are GPUs available for general use on the YCRC clusters. In order to use them, you must request them for your job . See the Grace , McCleary , and Milgram pages for hardware and partition specifics. Please do not use nodes with GPUs unless your application or job can make use of them. Any jobs submitted to a GPU partition without having requested a GPU may be terminated without warning.","title":"GPUs and CUDA"},{"location":"clusters-at-yale/guides/gpus-cuda/#monitor-activity-and-drivers","text":"The CUDA libraries you load will allow you to compile code against them. To run CUDA-enabled code you must also be running on a node with a gpu allocated and a compatible driver installed. The minimum driver versions are listed on this nvidia developer site . You can check the available GPUs, their current usage, installed version of the nvidia drivers, and more with the command nvidia-smi . Either in an interactive job or after connecting to a node running your job with ssh , nvidia-smi output should look something like this: [ user@gpu01 ~ ] $ nvidia-smi +-----------------------------------------------------------------------------+ | NVIDIA-SMI 460 .32.03 Driver Version: 460 .32.03 CUDA Version: 11 .2 | | -------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | =============================== + ====================== + ====================== | | 0 GeForce GTX 108 ... On | 00000000 :02:00.0 Off | N/A | | 23 % 34C P8 9W / 250W | 1MiB / 11178MiB | 0 % Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | | ============================================================================= | | No running processes found | +-----------------------------------------------------------------------------+ Here we see that the node gpu01 is running driver version 460.32.03 and is compatible with CUDA version 11.2. There are no processes using the GPU allocated to this job.","title":"Monitor Activity and Drivers"},{"location":"clusters-at-yale/guides/gpus-cuda/#software","text":"","title":"Software"},{"location":"clusters-at-yale/guides/gpus-cuda/#cuda-cudnn-tensorflow-and-pytorch-availability-on-cluster","text":"We have built certain versions of CUDA, cuDNN, tensorflow, and pytorch on all the clusters YCRC maintains. If one of the versions of these modules aligns with the version needed for your research, then there may be no need to install these programs yourself. To list all the modules available for these programs: module avail cuda/ module avail cudnn/ module avail tensorflow module avail pytorch","title":"Cuda, cuDNN, tensorflow, and pytorch availability on cluster"},{"location":"clusters-at-yale/guides/gpus-cuda/#tensorflow","text":"You can find hints about the correct version of Tensorflow from their tested build configurations . You can also test your install with a simple script that imports Tensorflow (run on a GPU node). If you an ImportError that mentions missing libraries like libcublas.so.9.0 , for example, that means that Tensorflow is probably expecting CUDA v 9.0 but cannot find it.","title":"Tensorflow"},{"location":"clusters-at-yale/guides/gpus-cuda/#tensorflow-gpu","text":"Tensorflow-gpu is now depreciated for newer versions of CUDA and cuDNN and has been combined with the original tensorflow. Any version of tensorflow 2.* contains gpu capabilities and should be installed instead of attempting to install tensorflow-gpu.","title":"Tensorflow-gpu"},{"location":"clusters-at-yale/guides/gpus-cuda/#create-an-example-tensorflow-environment","text":"To create a conda environment with Tensorflow and uses the module CUDA: # load modules, including the system CUDA and cuDNN module load miniconda CUDAcore/11.3.1 cuDNN/8.2.1.32-CUDA-11.3.1 # save module collection for future use module save cuda11 #create environment with required dependencies conda create --name tf-modulecuda python = 3 .11.* numpy pandas matplotlib jupyter -c conda-forge # activate environment conda activate tf-modulecuda # use pip to install tensorflow pip install tensorflow == 2 .12.* The most up to date instructions for creating your own cuda/tensorflow environment can be found here . To create a conda environment with your own versions of Cuda and tensorflow: For tensorflow 2.12+: module load miniconda conda create --name tf-condacuda python numpy pandas matplotlib jupyter cudatoolkit = 11 .8.0 conda activate tf-condacuda pip install nvidia-cudnn-cu11 == 8 .6.0.163 # Store system paths to cuda libraries for gpu communication mkdir -p $CONDA_PREFIX /etc/conda/activate.d echo 'CUDNN_PATH=$(dirname $(python -c \"import nvidia.cudnn;print(nvidia.cudnn.__file__)\"))' >> $CONDA_PREFIX /etc/conda/activate.d/env_vars.sh echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX /etc/conda/activate.d/env_vars.sh #install tensorflow pip install tensorflow == 2 .12.* For tensorflow 2.11.* module load miniconda conda create --name tf-condacuda python numpy pandas matplotlib jupyter cudatoolkit = 11 .3.1 cudnn = 8 .2.1 conda activate tf-condacuda # Store system paths to cuda libraries for gpu communication mkdir -p $CONDA_PREFIX /etc/conda/activate.d echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX /etc/conda/activate.d/env_vars.sh #install tensorflow pip install tensorflow == 2 .11.*","title":"Create an Example Tensorflow Environment"},{"location":"clusters-at-yale/guides/gpus-cuda/#use-your-environment","text":"To re-enter your environment you only need the following: module load miniconda conda activate tf-condacuda Or if using the module-installed CUDA: module restore cuda11 conda activate tf-modulecuda","title":"Use Your Environment"},{"location":"clusters-at-yale/guides/gpus-cuda/#pytorch","text":"As with Tensorflow, sometimes the conda-supplied CUDA libraries are sufficient for the version of PyTorch you are installing. If not make sure you have the version of CUDA referenced on the PyTorch site in their install instructions . They also provide instructions on installing previous versions compatible with older versions of CUDA. Following the instructions on their site, create a PyTorch environment using conda : module load miniconda conda create --name pytorch_env pytorch torchvision torchaudio pytorch-cuda = 11 .7 -c pytorch -c nvidia","title":"PyTorch"},{"location":"clusters-at-yale/guides/gpus-cuda/#compile-c-or-cpp-files-with-cuda-code","text":"By default, nvcc expects that host code is in files with a .c or .cpp extension, and device code is in files with a .cu extension. When you mix device code in a .c or .cpp file with host code, the device code will not be recoganized by nvcc unless you add this flag: -x cu . nvcc -x cu mycuda.cpp -o mycuda.exe","title":"Compile .c or .cpp Files with CUDA code"},{"location":"clusters-at-yale/guides/isca/","text":"Isca Isca is a framework used for idealized global circulation modelling. We recommend that you install it for yourself individually as the code expects to be able to modify its source code files. It is relatively straighforward to install into a conda environment as described below. Install Isca Install it for just your user as a Python conda environment called \"isca\". module load netCDF-Fortran/4.5.3-gompi-2020b module load miniconda module save isca mkdir ~/programs cd ~/programs git clone https://www.github.com/execlim/isca.git conda create -n isca python=3.7 conda activate isca conda install tqdm cd isca/src/extra/python pip install -e . Then add the following to your .bashrc file # Isca # directory of the Isca source code export GFDL_BASE=$HOME/programs/isca # \"environment\" configuration for grace export GFDL_ENV=gfortran # temporary working directory used in running the model export GFDL_WORK=$PALMER_SCRATCH/gfdl_work # directory for storing model output export GFDL_DATA=$GIBBS_PROJECT/gfdl_data Select an Experiment and Update the Flags We are using GCC version 10.x for this build, so a slight modification needs to made to Isca for it to build . Add the following line to the experiment script (e.g. $GFDL_BASE/exp/test_cases/held_suarez/held_suarez_test_case.py ), after cb is defined (so about line 13 in that file). cb.compile_flags.extend(['-fallow-argument-mismatch', '-fallow-invalid-boz']) Run Isca The above commands only need to be run once to set everything up. To use it, you will first always need to run: module restore isca conda activate isca Then you should be able to compile and launch your ISCA models.","title":"Isca"},{"location":"clusters-at-yale/guides/isca/#isca","text":"Isca is a framework used for idealized global circulation modelling. We recommend that you install it for yourself individually as the code expects to be able to modify its source code files. It is relatively straighforward to install into a conda environment as described below.","title":"Isca"},{"location":"clusters-at-yale/guides/isca/#install-isca","text":"Install it for just your user as a Python conda environment called \"isca\". module load netCDF-Fortran/4.5.3-gompi-2020b module load miniconda module save isca mkdir ~/programs cd ~/programs git clone https://www.github.com/execlim/isca.git conda create -n isca python=3.7 conda activate isca conda install tqdm cd isca/src/extra/python pip install -e . Then add the following to your .bashrc file # Isca # directory of the Isca source code export GFDL_BASE=$HOME/programs/isca # \"environment\" configuration for grace export GFDL_ENV=gfortran # temporary working directory used in running the model export GFDL_WORK=$PALMER_SCRATCH/gfdl_work # directory for storing model output export GFDL_DATA=$GIBBS_PROJECT/gfdl_data","title":"Install Isca"},{"location":"clusters-at-yale/guides/isca/#select-an-experiment-and-update-the-flags","text":"We are using GCC version 10.x for this build, so a slight modification needs to made to Isca for it to build . Add the following line to the experiment script (e.g. $GFDL_BASE/exp/test_cases/held_suarez/held_suarez_test_case.py ), after cb is defined (so about line 13 in that file). cb.compile_flags.extend(['-fallow-argument-mismatch', '-fallow-invalid-boz'])","title":"Select an Experiment and Update the Flags"},{"location":"clusters-at-yale/guides/isca/#run-isca","text":"The above commands only need to be run once to set everything up. To use it, you will first always need to run: module restore isca conda activate isca Then you should be able to compile and launch your ISCA models.","title":"Run Isca"},{"location":"clusters-at-yale/guides/jupyter/","text":"Jupyter Notebooks We provide a simple way to start Jupyter Notebook interfaces for Python and R using Open OnDemand . Jupyter notebooks provide a flexible way to interactively work with code and plots presented in-line together. To get started choose Jupyter Notebook from the OOD Interactive Apps menu or click on the link on the dashboard. Before you get started, you will need to be on campus or logged in to the Yale VPN and you will need to set up a Jupyter environment. Set up an environment We recommend you use miniconda to manage your Jupyter environments. You can create Conda environments from the OOD shell interface or from a terminal-based login to the clusters. For example, if you want to create an environment with many commonly used scientific computing Python packages you would run: module load miniconda conda create -y -n notebook_env python jupyter numpy pandas matplotlib Specify your resource request You can use the ycrc_default environment or chose one of your own from the drop-down menu. After specifying the required resources (number of CPUs/GPUs, amount of RAM, etc.), you can submit the job. When it launches you can open the standard Jupyter interface where you can start working with notebooks. Tip If you have installed and want to use Jupyter Lab click the Start JupyterLab checkbox. If there is a specific workflow which OOD does not satisfy, let us know and we can help.","title":"Jupyter Notebooks"},{"location":"clusters-at-yale/guides/jupyter/#jupyter-notebooks","text":"We provide a simple way to start Jupyter Notebook interfaces for Python and R using Open OnDemand . Jupyter notebooks provide a flexible way to interactively work with code and plots presented in-line together. To get started choose Jupyter Notebook from the OOD Interactive Apps menu or click on the link on the dashboard. Before you get started, you will need to be on campus or logged in to the Yale VPN and you will need to set up a Jupyter environment.","title":"Jupyter Notebooks"},{"location":"clusters-at-yale/guides/jupyter/#set-up-an-environment","text":"We recommend you use miniconda to manage your Jupyter environments. You can create Conda environments from the OOD shell interface or from a terminal-based login to the clusters. For example, if you want to create an environment with many commonly used scientific computing Python packages you would run: module load miniconda conda create -y -n notebook_env python jupyter numpy pandas matplotlib","title":"Set up an environment"},{"location":"clusters-at-yale/guides/jupyter/#specify-your-resource-request","text":"You can use the ycrc_default environment or chose one of your own from the drop-down menu. After specifying the required resources (number of CPUs/GPUs, amount of RAM, etc.), you can submit the job. When it launches you can open the standard Jupyter interface where you can start working with notebooks. Tip If you have installed and want to use Jupyter Lab click the Start JupyterLab checkbox. If there is a specific workflow which OOD does not satisfy, let us know and we can help.","title":"Specify your resource request"},{"location":"clusters-at-yale/guides/jupyter_ssh/","text":"Jupyter Notebooks over SSH Port Forwarding If you want finer control over your notebook job, or wish to use something besides conda for your Python environment, you can manually configure a Jupyter notebook and connect manually. The main steps are: Start a Jupyter notebook job. Start an ssh tunnel. Use your local browser to connect. Start the Server Here is a template for submitting a jupyter-notebook server as a batch job. You may need to edit some of the slurm options, including the time limit or the partition. You will also need to either load a module that contains jupyter-notebook . Tip If you are using a Conda environment, please follow the instructions for launching a Jupyter session via Open OnDemand . Save your edited version of this script on the cluster, and submit it with sbatch . #!/bin/bash #SBATCH --partition devel #SBATCH --cpus-per-task 1 #SBATCH --mem-per-cpu 8G #SBATCH --time 6:00:00 #SBATCH --job-name jupyter-notebook #SBATCH --output jupyter-notebook-%J.log # get tunneling info XDG_RUNTIME_DIR = \"\" port = $( shuf -i8000-9999 -n1 ) node = $( hostname -s ) user = $( whoami ) cluster = $( hostname -f | awk -F \".\" '{print $2}' ) # print tunneling instructions jupyter-log echo -e \" For more info and how to connect from windows, see https://docs.ycrc.yale.edu/clusters-at-yale/guides/jupyter/ MacOS or linux terminal command to create your ssh tunnel ssh -N -L ${ port } : ${ node } : ${ port } ${ user } @ ${ cluster } .ycrc.yale.edu Windows MobaXterm info Forwarded port:same as remote port Remote server: ${ node } Remote port: ${ port } SSH server: ${ cluster } .ycrc.yale.edu SSH login: $user SSH port: 22 Use a Browser on your local machine to go to: localhost: ${ port } (prefix w/ https:// if using password) \" # load modules or conda environments here jupyter-notebook --no-browser --port = ${ port } --ip = ${ node } Start the Tunnel Once you have submitted your job and it starts, your notebook server will be ready for you to connect. You can run squeue -u${USER} to check. You will see an \"R\" in the ST or status column for your notebook job if it is running. If you see a \"PD\" in the status column, you will have to wait for your job to start running to connect. The log file with information about how to connect will be in the directory you submitted the script from, and be named jupyter-notebook-[jobid].log where jobid is the slurm id for your job. MacOS and Linux On a Mac or Linux machine, you can start the tunnel with an SSH command. You can check the output from the job you started to get the specifc info you need. Windows On a Windows machine, we recommend you use MobaXterm. See our guide on connecting with MobaXterm for instructions on how to get set up. You will need to take a look at your job's log file to get the details you need. Then start MobaXterm: Under Tools choose \"MobaSSHTunnel (port forwarding)\". Click the \"New SSH Tunnel\" button. Click the radio button for \"Local port forwarding\". Use the information in your jupyter notebook log file to fill out the boxes. Click Save. On your new tunnel, click the key symbol under the settings column and choose your ssh private key. Click the play button under the Start/Stop column. Browse the Notebook Finally, open a web browser on your local machine and enter the address http://localhost:port where port is the one specified in your log file. The address Jupyter creates by default (the one with the name of a compute node) will not work outside the cluster's network. Since version 5 of jupyter, the notebook will automatically generate a token that allows you to authenticate when you connect. It is long, and will be at the end of the url jupyter generates. It will look something like http://c14n06:9230/?token=**ad0775eaff315e6f1d98b13ef10b919bc6b9ef7d0605cc20** If you run into trouble or need help, contact us .","title":"Jupyter Notebooks over SSH Port Forwarding"},{"location":"clusters-at-yale/guides/jupyter_ssh/#jupyter-notebooks-over-ssh-port-forwarding","text":"If you want finer control over your notebook job, or wish to use something besides conda for your Python environment, you can manually configure a Jupyter notebook and connect manually. The main steps are: Start a Jupyter notebook job. Start an ssh tunnel. Use your local browser to connect.","title":"Jupyter Notebooks over SSH Port Forwarding"},{"location":"clusters-at-yale/guides/jupyter_ssh/#start-the-server","text":"Here is a template for submitting a jupyter-notebook server as a batch job. You may need to edit some of the slurm options, including the time limit or the partition. You will also need to either load a module that contains jupyter-notebook . Tip If you are using a Conda environment, please follow the instructions for launching a Jupyter session via Open OnDemand . Save your edited version of this script on the cluster, and submit it with sbatch . #!/bin/bash #SBATCH --partition devel #SBATCH --cpus-per-task 1 #SBATCH --mem-per-cpu 8G #SBATCH --time 6:00:00 #SBATCH --job-name jupyter-notebook #SBATCH --output jupyter-notebook-%J.log # get tunneling info XDG_RUNTIME_DIR = \"\" port = $( shuf -i8000-9999 -n1 ) node = $( hostname -s ) user = $( whoami ) cluster = $( hostname -f | awk -F \".\" '{print $2}' ) # print tunneling instructions jupyter-log echo -e \" For more info and how to connect from windows, see https://docs.ycrc.yale.edu/clusters-at-yale/guides/jupyter/ MacOS or linux terminal command to create your ssh tunnel ssh -N -L ${ port } : ${ node } : ${ port } ${ user } @ ${ cluster } .ycrc.yale.edu Windows MobaXterm info Forwarded port:same as remote port Remote server: ${ node } Remote port: ${ port } SSH server: ${ cluster } .ycrc.yale.edu SSH login: $user SSH port: 22 Use a Browser on your local machine to go to: localhost: ${ port } (prefix w/ https:// if using password) \" # load modules or conda environments here jupyter-notebook --no-browser --port = ${ port } --ip = ${ node }","title":"Start the Server"},{"location":"clusters-at-yale/guides/jupyter_ssh/#start-the-tunnel","text":"Once you have submitted your job and it starts, your notebook server will be ready for you to connect. You can run squeue -u${USER} to check. You will see an \"R\" in the ST or status column for your notebook job if it is running. If you see a \"PD\" in the status column, you will have to wait for your job to start running to connect. The log file with information about how to connect will be in the directory you submitted the script from, and be named jupyter-notebook-[jobid].log where jobid is the slurm id for your job.","title":"Start the Tunnel"},{"location":"clusters-at-yale/guides/jupyter_ssh/#macos-and-linux","text":"On a Mac or Linux machine, you can start the tunnel with an SSH command. You can check the output from the job you started to get the specifc info you need.","title":"MacOS and Linux"},{"location":"clusters-at-yale/guides/jupyter_ssh/#windows","text":"On a Windows machine, we recommend you use MobaXterm. See our guide on connecting with MobaXterm for instructions on how to get set up. You will need to take a look at your job's log file to get the details you need. Then start MobaXterm: Under Tools choose \"MobaSSHTunnel (port forwarding)\". Click the \"New SSH Tunnel\" button. Click the radio button for \"Local port forwarding\". Use the information in your jupyter notebook log file to fill out the boxes. Click Save. On your new tunnel, click the key symbol under the settings column and choose your ssh private key. Click the play button under the Start/Stop column.","title":"Windows"},{"location":"clusters-at-yale/guides/jupyter_ssh/#browse-the-notebook","text":"Finally, open a web browser on your local machine and enter the address http://localhost:port where port is the one specified in your log file. The address Jupyter creates by default (the one with the name of a compute node) will not work outside the cluster's network. Since version 5 of jupyter, the notebook will automatically generate a token that allows you to authenticate when you connect. It is long, and will be at the end of the url jupyter generates. It will look something like http://c14n06:9230/?token=**ad0775eaff315e6f1d98b13ef10b919bc6b9ef7d0605cc20** If you run into trouble or need help, contact us .","title":"Browse the Notebook"},{"location":"clusters-at-yale/guides/mathematica/","text":"Mathematica Open OnDemand We strongly recommend using Open OnDemand to launch Mathematica. First, open OOD in a browser and navigate to the Apps button. Select All Apps from the drop-down menu and then select Mathematica from the list. Fill in your resource requests and launch your job. Once started, click Launch Mathematica and Mathematica will be opened in a new tab in the browser. Interactive Job Alternatively, you could start an interacgive session with X11 forwarding. Warning The Mathematica program is too large to fit on a login node. If you try to run it there, it will crash. Instead, launch it in an interactive job (see below). To run Mathematica interactively, you need to request an interactive session on a compute node. You could start an interactive session using Slurm. For example, to use 4 cores on 1 node: salloc --x11 -c 4 -t 4:00:00 Note that if you are on macOS, you will need to install an additional program to use the GUI. See our X11 Forwarding documentation for instructions. See our Slurm documentation for more detailed information on requesting resources for interactive jobs. To launch Mathematica, you will first need to make sure you have the correct module loaded. You can search for all available Mathematica versions: module avail mathematica Load the appropriate module file. For example, to run version 12.0.0: module load Mathematica/12.0.0 The module load command sets up your environment, including the PATH to find the proper version of the Mathematica program. If you would like to avoid running the load command every session, you can run module save and then the Mathematica module will be loaded every time you login. Once you have the appropriate module loaded in an interactive job, start Mathematica. The & will put the program in the background so you can continue to use your terminal session. Mathematica & Configure Environment for Parallel Jobs Mathematica installed on Yale HPC clusters includes our proprietary scripts to run parallel jobs in SLURM environments. These scripts are designed in a way to allow users to access up to 450 parallel kernels. When a user asks for a specific number of kernels, the wait time to get them might differ dramatically depending on requested computing resources as well as on how busy the HPC cluster is at that moment. To reduce waiting time, our scripts try to launch as many kernels as possible at the moment the user asks for them. Most of the time you will not get launched with the same number of kernels as you requested. We recommend checking the final number of parallel kernels you\u2019ve gotten after the launching command has completed no matter if you run a Front End Mathematica session or execute Wolfram script. One of the ways to check this would be the Mathematica command Length[Kernels[]] . In order to run parallel Mathematica jobs on our cluster, you will need to configure your Mathematica environment. You have to do this within a Front End session. If you run Wolfram script you need to run a Front End session to set your parallel environment before executing your script. Once Mathematica is started, open a new document in the Mathematica window and go to Edit > Preferences . From there, go to Evaluate/Parallel Kernel Configuration and change the following settings: Under Local Kernels , disable Local Kernels if it is enabled Go in Cluster Integration and first enable cluster integration it if it is not enabled Under the Cluster Integration tab, expand the Advanced Settings arrow. When you configure parallel kernels for the first time, please select SLURM from the Cluster Engine pull-down menu Matching parallel kernel versions with your main Mathematica version is important, especially if you\u2019ve already had SLURM selected by running different Mathematica versions previously (you might see different versions in Kernel program) In this case, select Windows CCS from Cluster Engine and a red error will appear in Advanced Settings. After this select SLURM again as this should set the correct engine for you. Under Kernels , set your desired number (we recommend to set it lower first to test) In Advanced Settings under Native specification , specify time and RAM per kernel, such as \u2014time=02:00:00 \u2014mem=20G (please note that this is RAM per one kernel) If you are using Mathematica 12.3 and above, and if RemoteKernel Objects is enabled, disable it and restart your Mathematica session We recommend to use these commands to start kernels and to check how many kernels have actually been launched (please keep them in the same Mathematica cell and separate by semicolons; Do not use semicolon at the end) $DefaultKernels=$ConfiguredKernels; LaunchKernels[]; Length[Kernels[]] Request Help or Access to Wolfram Alpha Pro If you need any assistance with your Mathematica program, contact us .","title":"Mathematica"},{"location":"clusters-at-yale/guides/mathematica/#mathematica","text":"","title":"Mathematica"},{"location":"clusters-at-yale/guides/mathematica/#open-ondemand","text":"We strongly recommend using Open OnDemand to launch Mathematica. First, open OOD in a browser and navigate to the Apps button. Select All Apps from the drop-down menu and then select Mathematica from the list. Fill in your resource requests and launch your job. Once started, click Launch Mathematica and Mathematica will be opened in a new tab in the browser.","title":"Open OnDemand"},{"location":"clusters-at-yale/guides/mathematica/#interactive-job","text":"Alternatively, you could start an interacgive session with X11 forwarding. Warning The Mathematica program is too large to fit on a login node. If you try to run it there, it will crash. Instead, launch it in an interactive job (see below). To run Mathematica interactively, you need to request an interactive session on a compute node. You could start an interactive session using Slurm. For example, to use 4 cores on 1 node: salloc --x11 -c 4 -t 4:00:00 Note that if you are on macOS, you will need to install an additional program to use the GUI. See our X11 Forwarding documentation for instructions. See our Slurm documentation for more detailed information on requesting resources for interactive jobs. To launch Mathematica, you will first need to make sure you have the correct module loaded. You can search for all available Mathematica versions: module avail mathematica Load the appropriate module file. For example, to run version 12.0.0: module load Mathematica/12.0.0 The module load command sets up your environment, including the PATH to find the proper version of the Mathematica program. If you would like to avoid running the load command every session, you can run module save and then the Mathematica module will be loaded every time you login. Once you have the appropriate module loaded in an interactive job, start Mathematica. The & will put the program in the background so you can continue to use your terminal session. Mathematica &","title":"Interactive Job"},{"location":"clusters-at-yale/guides/mathematica/#configure-environment-for-parallel-jobs","text":"Mathematica installed on Yale HPC clusters includes our proprietary scripts to run parallel jobs in SLURM environments. These scripts are designed in a way to allow users to access up to 450 parallel kernels. When a user asks for a specific number of kernels, the wait time to get them might differ dramatically depending on requested computing resources as well as on how busy the HPC cluster is at that moment. To reduce waiting time, our scripts try to launch as many kernels as possible at the moment the user asks for them. Most of the time you will not get launched with the same number of kernels as you requested. We recommend checking the final number of parallel kernels you\u2019ve gotten after the launching command has completed no matter if you run a Front End Mathematica session or execute Wolfram script. One of the ways to check this would be the Mathematica command Length[Kernels[]] . In order to run parallel Mathematica jobs on our cluster, you will need to configure your Mathematica environment. You have to do this within a Front End session. If you run Wolfram script you need to run a Front End session to set your parallel environment before executing your script. Once Mathematica is started, open a new document in the Mathematica window and go to Edit > Preferences . From there, go to Evaluate/Parallel Kernel Configuration and change the following settings: Under Local Kernels , disable Local Kernels if it is enabled Go in Cluster Integration and first enable cluster integration it if it is not enabled Under the Cluster Integration tab, expand the Advanced Settings arrow. When you configure parallel kernels for the first time, please select SLURM from the Cluster Engine pull-down menu Matching parallel kernel versions with your main Mathematica version is important, especially if you\u2019ve already had SLURM selected by running different Mathematica versions previously (you might see different versions in Kernel program) In this case, select Windows CCS from Cluster Engine and a red error will appear in Advanced Settings. After this select SLURM again as this should set the correct engine for you. Under Kernels , set your desired number (we recommend to set it lower first to test) In Advanced Settings under Native specification , specify time and RAM per kernel, such as \u2014time=02:00:00 \u2014mem=20G (please note that this is RAM per one kernel) If you are using Mathematica 12.3 and above, and if RemoteKernel Objects is enabled, disable it and restart your Mathematica session We recommend to use these commands to start kernels and to check how many kernels have actually been launched (please keep them in the same Mathematica cell and separate by semicolons; Do not use semicolon at the end) $DefaultKernels=$ConfiguredKernels; LaunchKernels[]; Length[Kernels[]]","title":"Configure Environment for Parallel Jobs"},{"location":"clusters-at-yale/guides/mathematica/#request-help-or-access-to-wolfram-alpha-pro","text":"If you need any assistance with your Mathematica program, contact us .","title":"Request Help or Access to Wolfram Alpha Pro"},{"location":"clusters-at-yale/guides/matlab/","text":"MATLAB MATLAB GUI To use the MATLAB GUI, we recommend our web portal, Open OnDemand . Once logged in, click MATLAB pinned on the dashboard, or select \"MATLAB\" from the \"Interactive Apps\" list. Command Line MATLAB Find MATLAB Run one of the commands below, which will list available versions and the corresponding module files: module avail matlab Load the appropriate module file. For example, to run version R2021a: module load MATLAB/2021a The module load command sets up your environment, including the PATH to find the proper version of the MATLAB program. Run MATLAB Warning If you try to run MATLAB on a login node, it will likely crash. Instead, launch it in an interactive or batch job (see below). Interactive Job (without a GUI) To run MATLAB interactively, you need to create an interactive session on a compute node. You could start an interactive session using 4 cores, 16GiB of RAM for 4 hours with: salloc -c 4 --mem 16G -t 4:00:00 Once your interactive session starts, you can load the appropriate module file and start MATLAB module load MATLAB/2021a # launch the MATLAB command line prompt maltab -nodisplay # launch a script on the command line matlab -nodisplay < runscript.m See our Slurm documentation for more detailed information on requesting resources for interactive jobs. Batch Mode (without a GUI) Create a batch script with the resource requests appropriate to your MATLAB function(s) and script(s). In it load the MATLAB module version you want, then run matlab with the -b option and your function/script name. Here is an example that requests 4 CPUs and 18GiB of memory for 8 hours: #!/bin/bash #SBATCH --job-name myjob #SBATCH --cpus-per-task 4 #SBATCH --mem 18G #SBATCH -t 8:00:00 module load MATLAB/2021a # assuming you have your_script.m in the current directory matlab -batch \"your_script\" # if using MATLAB older than R2019a # matlab -nojvm -nodisplay -nosplash < your_script.m Using More than 12 Cores with MATLAB In MATLAB, 12 workers is a poorly documented default limit (seemingly for historical reasons) when setting up the parallel environment. You can override it by explicitly setting up your parpool before calling parfor or other parallel functions. parpool(feature('NumCores'));","title":"MATLAB"},{"location":"clusters-at-yale/guides/matlab/#matlab","text":"","title":"MATLAB"},{"location":"clusters-at-yale/guides/matlab/#matlab-gui","text":"To use the MATLAB GUI, we recommend our web portal, Open OnDemand . Once logged in, click MATLAB pinned on the dashboard, or select \"MATLAB\" from the \"Interactive Apps\" list.","title":"MATLAB GUI"},{"location":"clusters-at-yale/guides/matlab/#command-line-matlab","text":"","title":"Command Line MATLAB"},{"location":"clusters-at-yale/guides/matlab/#find-matlab","text":"Run one of the commands below, which will list available versions and the corresponding module files: module avail matlab Load the appropriate module file. For example, to run version R2021a: module load MATLAB/2021a The module load command sets up your environment, including the PATH to find the proper version of the MATLAB program.","title":"Find MATLAB"},{"location":"clusters-at-yale/guides/matlab/#run-matlab","text":"Warning If you try to run MATLAB on a login node, it will likely crash. Instead, launch it in an interactive or batch job (see below).","title":"Run MATLAB"},{"location":"clusters-at-yale/guides/matlab/#interactive-job-without-a-gui","text":"To run MATLAB interactively, you need to create an interactive session on a compute node. You could start an interactive session using 4 cores, 16GiB of RAM for 4 hours with: salloc -c 4 --mem 16G -t 4:00:00 Once your interactive session starts, you can load the appropriate module file and start MATLAB module load MATLAB/2021a # launch the MATLAB command line prompt maltab -nodisplay # launch a script on the command line matlab -nodisplay < runscript.m See our Slurm documentation for more detailed information on requesting resources for interactive jobs.","title":"Interactive Job (without a GUI)"},{"location":"clusters-at-yale/guides/matlab/#batch-mode-without-a-gui","text":"Create a batch script with the resource requests appropriate to your MATLAB function(s) and script(s). In it load the MATLAB module version you want, then run matlab with the -b option and your function/script name. Here is an example that requests 4 CPUs and 18GiB of memory for 8 hours: #!/bin/bash #SBATCH --job-name myjob #SBATCH --cpus-per-task 4 #SBATCH --mem 18G #SBATCH -t 8:00:00 module load MATLAB/2021a # assuming you have your_script.m in the current directory matlab -batch \"your_script\" # if using MATLAB older than R2019a # matlab -nojvm -nodisplay -nosplash < your_script.m","title":"Batch Mode (without a GUI)"},{"location":"clusters-at-yale/guides/matlab/#using-more-than-12-cores-with-matlab","text":"In MATLAB, 12 workers is a poorly documented default limit (seemingly for historical reasons) when setting up the parallel environment. You can override it by explicitly setting up your parpool before calling parfor or other parallel functions. parpool(feature('NumCores'));","title":"Using More than 12 Cores with MATLAB"},{"location":"clusters-at-yale/guides/mpi4py/","text":"MPI Parallelism with Python Note Before venturing into MPI-based parallelism, consider whether your work can be resturctured to make use of dSQ or more \"embarrassingly parallel\" workflows. MPI can be thought of as a \"last resort\" for parallel programming. There are many computational problems that can be have increased performance by running pieces in parallel. These often require communication between the different steps and need a way to send messages between processes. Examples of this include simulations of galaxy formation and electric field simulations, analysis of a single large dataset, or complex search or sort algorithms. MPI and mpi4py There is a standard protocol, called MPI , that defines how messages are passed between processes, including one-to-one and broadcast communications. The Python module for this is called mpi4py : mpi4py Read The Docs Message Passing Interface implemented for Python. Supports point-to-point (sends, receives) and collective (broadcasts, scatters, gathers) communications of any picklable Python object, as well as optimized communications of Python object exposing the single-segment buffer interface (NumPy arrays, builtin bytes/string/array objects) We will go over a few simple examples here. Definitions COMM : The communication \"world\" defined by MPI RANK : an ID number given to each internal process to define communication SIZE : total number of processes allocated BROADCAST : One-to-many communication SCATTER : One-to-many data distribution GATHER : Many-to-one data distribution mpi4py on the clusters On the clusters, the easiest way to start using mpi4py is to use the module-based software for OpenMPI and Python: # toolchains 2020b and before module load SciPy-bundle/2020.11-foss-2020b # toolchains starting with 2022b module load mpi4py/3.1.4-gompi-2022b Warning mpi4py installed via Conda is unaware of the cluster infrastructure and therefore will likely only work on a single compute node. If you wish to get a conda environment working across multiple nodes, please reach out to hpc@yale.edu for assistance. Cluster Resource Requests MPI utilizes Slurm tasks as the individual parallel workers. Therefore, when requesting resources (either interactively or in batch-mode) the number of tasks will determine the number of parallel workers (or to use MPI's language, the SIZE of the COMM World ). To request four tasks (each with a single CPU) interactively run the following: salloc --cpus-per-task = 1 --ntasks = 4 This can also be achieved in batch-mode by including the following directives in your submission script: #SBATCH --cpus-per-task=1 #SBATCH --ntasks=4 A more detailed discussion of resource requests can be found here and further examples are available here . Examples Ex 1: Rank This is a simple example where each worker reports their RANK and the process ID running that particular task. from mpi4py import MPI # instantize the communication world comm = MPI . COMM_WORLD # get the size of the communication world size = comm . Get_size () # get this particular processes' `rank` ID rank = comm . Get_rank () PID = os . getpid () print ( f 'rank: { rank } has PID: { PID } ' ) We then execute this code (named mpi_simple.py ) by running the following on the command line: mpirun -n 4 python mpi_simple.py The mpirun command is a wrapper for the MPI interface. Then we tell that to set up a COMM_WORLD with 4 workers. Finally we tell mpirun to run python mpi_simple.py on each of the four workers. Which outputs the following: rank : 0 has PID : 89134 rank : 1 has PID : 89135 rank : 2 has PID : 89136 rank : 3 has PID : 89137 Ex 2: Point to Point Communicators The most basic communication operators are \" send \" and \" recv \". These can be a bit tricky since they are \"blocking\" commands and can cause the program to hang. comm . send ( obj , dest , tag = 0 ) comm . recv ( source = MPI . ANY_SOURCE , tag = MPI . ANY_TAG , status = None ) tag can be used as a filter dest must be a rank in the current communicator source can be a rank or a wild-card ( MPI.ANY_SOURCE ) status used to retrieve information about recv'd message We now we create a file ( mpi_comm.py ) that contains the following: from mpi4py import MPI comm = MPI . COMM_WORLD size = comm . Get_size () rank = comm . Get_rank () if rank == 0 : msg = 'Hello, world' comm . send ( msg , dest = 1 ) elif rank == 1 : s = comm . recv () print ( f \"rank { rank } : { s } \" ) When we run this on the command line ( mpirun -n 4 python mpi_comm.py ) we get the following: rank 1: Hello, world The RANK=0 process sends the message, and the RANK=1 process receives it. The other two processes are effectively bystanders in this example. Ex 3: Broadcast Now we will try a slightly more complicated example that involves sending messages and data between processes. # Import MPI from mpi4py import MPI # Define world comm = MPI . COMM_WORLD size = comm . Get_size () rank = comm . Get_rank () # Create some data in the RANK_0 worker if rank == 0 : data = { 'key1' : [ 7 , 2.72 , 2 + 3 j ], 'key2' : ( 'abc' , 'xyz' )} else : data = None # Broadcast the data from RANK_0 to all workers data = comm . bcast ( data , root = 0 ) # Append the RANK ID to the data data [ 'key1' ] . append ( rank ) # Print the resulting data print ( f \"Rank: { rank } , data: { data } \" ) We then execute this code (named mpi_message.py ) by running the following on the command line: mpirun -n 4 python mpi_message.py Which outputs the following: Rank : 0 , data : { 'key1' : [ 7 , 2.72 , ( 2 + 3 j ), 0 ], 'key2' : ( 'abc' , 'xyz' )} Rank : 2 , data : { 'key1' : [ 7 , 2.72 , ( 2 + 3 j ), 2 ], 'key2' : ( 'abc' , 'xyz' )} Rank : 3 , data : { 'key1' : [ 7 , 2.72 , ( 2 + 3 j ), 3 ], 'key2' : ( 'abc' , 'xyz' )} Rank : 1 , data : { 'key1' : [ 7 , 2.72 , ( 2 + 3 j ), 1 ], 'key2' : ( 'abc' , 'xyz' )} Ex 4: Scatter and Gather An effective way of distributing computationally intensive tasks is to scatter pieces of a large dataset to each task. The separate tasks perform some analysis on their chunk of data and then the results are gathered by RANK_0 . This example takes a large array of random numbers and splits it into pieces for each task. These smaller datasets are analyzed (taking an average in this example) and the results are returns to the main task with a Gather call. # import libraries from mpi4py import MPI import numpy as np # set up MPI world comm = MPI . COMM_WORLD size = comm . Get_size () # new: gives number of ranks in comm rank = comm . Get_rank () # generate a large array of data on RANK_0 numData = 100000000 # 100milion values each data = None if rank == 0 : data = np . random . normal ( loc = 10 , scale = 5 , size = numData ) # initialize empty arrays to receive the partial data partial = np . empty ( int ( numData / size ), dtype = 'd' ) # send data to the other workers comm . Scatter ( data , partial , root = 0 ) # prepare the reduced array to receive the processed data reduced = None if rank == 0 : reduced = np . empty ( size , dtype = 'd' ) # Average the partial arrays, and then gather them to RANK_0 comm . Gather ( np . average ( partial ), reduced , root = 0 ) if rank == 0 : print ( 'Full Average:' , np . average ( reduced )) This is executed on the command line: mpirun -n 4 python mpi/mpi_scatter.py Which prints: Full Average: 10.00002060397186 Key Take-aways and Further Reading MPI is a powerful tool to set up communication worlds and send data and messages between workers The mpi4py module provides tools for using MPI within Python. This is just the beginning, mpi4py can be used for so much more... To learn more, take a look at the mpi4py tutorial here .","title":"MPI with Python"},{"location":"clusters-at-yale/guides/mpi4py/#mpi-parallelism-with-python","text":"Note Before venturing into MPI-based parallelism, consider whether your work can be resturctured to make use of dSQ or more \"embarrassingly parallel\" workflows. MPI can be thought of as a \"last resort\" for parallel programming. There are many computational problems that can be have increased performance by running pieces in parallel. These often require communication between the different steps and need a way to send messages between processes. Examples of this include simulations of galaxy formation and electric field simulations, analysis of a single large dataset, or complex search or sort algorithms.","title":"MPI Parallelism with Python"},{"location":"clusters-at-yale/guides/mpi4py/#mpi-and-mpi4py","text":"There is a standard protocol, called MPI , that defines how messages are passed between processes, including one-to-one and broadcast communications. The Python module for this is called mpi4py : mpi4py Read The Docs Message Passing Interface implemented for Python. Supports point-to-point (sends, receives) and collective (broadcasts, scatters, gathers) communications of any picklable Python object, as well as optimized communications of Python object exposing the single-segment buffer interface (NumPy arrays, builtin bytes/string/array objects) We will go over a few simple examples here.","title":"MPI and mpi4py"},{"location":"clusters-at-yale/guides/mpi4py/#definitions","text":"COMM : The communication \"world\" defined by MPI RANK : an ID number given to each internal process to define communication SIZE : total number of processes allocated BROADCAST : One-to-many communication SCATTER : One-to-many data distribution GATHER : Many-to-one data distribution","title":"Definitions"},{"location":"clusters-at-yale/guides/mpi4py/#mpi4py-on-the-clusters","text":"On the clusters, the easiest way to start using mpi4py is to use the module-based software for OpenMPI and Python: # toolchains 2020b and before module load SciPy-bundle/2020.11-foss-2020b # toolchains starting with 2022b module load mpi4py/3.1.4-gompi-2022b Warning mpi4py installed via Conda is unaware of the cluster infrastructure and therefore will likely only work on a single compute node. If you wish to get a conda environment working across multiple nodes, please reach out to hpc@yale.edu for assistance.","title":"mpi4py on the clusters"},{"location":"clusters-at-yale/guides/mpi4py/#cluster-resource-requests","text":"MPI utilizes Slurm tasks as the individual parallel workers. Therefore, when requesting resources (either interactively or in batch-mode) the number of tasks will determine the number of parallel workers (or to use MPI's language, the SIZE of the COMM World ). To request four tasks (each with a single CPU) interactively run the following: salloc --cpus-per-task = 1 --ntasks = 4 This can also be achieved in batch-mode by including the following directives in your submission script: #SBATCH --cpus-per-task=1 #SBATCH --ntasks=4 A more detailed discussion of resource requests can be found here and further examples are available here .","title":"Cluster Resource Requests"},{"location":"clusters-at-yale/guides/mpi4py/#examples","text":"","title":"Examples"},{"location":"clusters-at-yale/guides/mpi4py/#ex-1-rank","text":"This is a simple example where each worker reports their RANK and the process ID running that particular task. from mpi4py import MPI # instantize the communication world comm = MPI . COMM_WORLD # get the size of the communication world size = comm . Get_size () # get this particular processes' `rank` ID rank = comm . Get_rank () PID = os . getpid () print ( f 'rank: { rank } has PID: { PID } ' ) We then execute this code (named mpi_simple.py ) by running the following on the command line: mpirun -n 4 python mpi_simple.py The mpirun command is a wrapper for the MPI interface. Then we tell that to set up a COMM_WORLD with 4 workers. Finally we tell mpirun to run python mpi_simple.py on each of the four workers. Which outputs the following: rank : 0 has PID : 89134 rank : 1 has PID : 89135 rank : 2 has PID : 89136 rank : 3 has PID : 89137","title":"Ex 1: Rank"},{"location":"clusters-at-yale/guides/mpi4py/#ex-2-point-to-point-communicators","text":"The most basic communication operators are \" send \" and \" recv \". These can be a bit tricky since they are \"blocking\" commands and can cause the program to hang. comm . send ( obj , dest , tag = 0 ) comm . recv ( source = MPI . ANY_SOURCE , tag = MPI . ANY_TAG , status = None ) tag can be used as a filter dest must be a rank in the current communicator source can be a rank or a wild-card ( MPI.ANY_SOURCE ) status used to retrieve information about recv'd message We now we create a file ( mpi_comm.py ) that contains the following: from mpi4py import MPI comm = MPI . COMM_WORLD size = comm . Get_size () rank = comm . Get_rank () if rank == 0 : msg = 'Hello, world' comm . send ( msg , dest = 1 ) elif rank == 1 : s = comm . recv () print ( f \"rank { rank } : { s } \" ) When we run this on the command line ( mpirun -n 4 python mpi_comm.py ) we get the following: rank 1: Hello, world The RANK=0 process sends the message, and the RANK=1 process receives it. The other two processes are effectively bystanders in this example.","title":"Ex 2: Point to Point Communicators"},{"location":"clusters-at-yale/guides/mpi4py/#ex-3-broadcast","text":"Now we will try a slightly more complicated example that involves sending messages and data between processes. # Import MPI from mpi4py import MPI # Define world comm = MPI . COMM_WORLD size = comm . Get_size () rank = comm . Get_rank () # Create some data in the RANK_0 worker if rank == 0 : data = { 'key1' : [ 7 , 2.72 , 2 + 3 j ], 'key2' : ( 'abc' , 'xyz' )} else : data = None # Broadcast the data from RANK_0 to all workers data = comm . bcast ( data , root = 0 ) # Append the RANK ID to the data data [ 'key1' ] . append ( rank ) # Print the resulting data print ( f \"Rank: { rank } , data: { data } \" ) We then execute this code (named mpi_message.py ) by running the following on the command line: mpirun -n 4 python mpi_message.py Which outputs the following: Rank : 0 , data : { 'key1' : [ 7 , 2.72 , ( 2 + 3 j ), 0 ], 'key2' : ( 'abc' , 'xyz' )} Rank : 2 , data : { 'key1' : [ 7 , 2.72 , ( 2 + 3 j ), 2 ], 'key2' : ( 'abc' , 'xyz' )} Rank : 3 , data : { 'key1' : [ 7 , 2.72 , ( 2 + 3 j ), 3 ], 'key2' : ( 'abc' , 'xyz' )} Rank : 1 , data : { 'key1' : [ 7 , 2.72 , ( 2 + 3 j ), 1 ], 'key2' : ( 'abc' , 'xyz' )}","title":"Ex 3: Broadcast"},{"location":"clusters-at-yale/guides/mpi4py/#ex-4-scatter-and-gather","text":"An effective way of distributing computationally intensive tasks is to scatter pieces of a large dataset to each task. The separate tasks perform some analysis on their chunk of data and then the results are gathered by RANK_0 . This example takes a large array of random numbers and splits it into pieces for each task. These smaller datasets are analyzed (taking an average in this example) and the results are returns to the main task with a Gather call. # import libraries from mpi4py import MPI import numpy as np # set up MPI world comm = MPI . COMM_WORLD size = comm . Get_size () # new: gives number of ranks in comm rank = comm . Get_rank () # generate a large array of data on RANK_0 numData = 100000000 # 100milion values each data = None if rank == 0 : data = np . random . normal ( loc = 10 , scale = 5 , size = numData ) # initialize empty arrays to receive the partial data partial = np . empty ( int ( numData / size ), dtype = 'd' ) # send data to the other workers comm . Scatter ( data , partial , root = 0 ) # prepare the reduced array to receive the processed data reduced = None if rank == 0 : reduced = np . empty ( size , dtype = 'd' ) # Average the partial arrays, and then gather them to RANK_0 comm . Gather ( np . average ( partial ), reduced , root = 0 ) if rank == 0 : print ( 'Full Average:' , np . average ( reduced )) This is executed on the command line: mpirun -n 4 python mpi/mpi_scatter.py Which prints: Full Average: 10.00002060397186","title":"Ex 4: Scatter and Gather"},{"location":"clusters-at-yale/guides/mpi4py/#key-take-aways-and-further-reading","text":"MPI is a powerful tool to set up communication worlds and send data and messages between workers The mpi4py module provides tools for using MPI within Python. This is just the beginning, mpi4py can be used for so much more... To learn more, take a look at the mpi4py tutorial here .","title":"Key Take-aways and Further Reading"},{"location":"clusters-at-yale/guides/mysql/","text":"Mysql Mysql is a popular relational database. Because a database is usually thought of as a persistent service, it is not ordinarily run on HPC clusters, since allocations on an HPC cluster are temporary. If you need a persistent mysql database server, we recommend either installing mysql on a server in your lab, or using ITS's Spinup service. In either case, the mysql server can be accessed remotely from the HPC clusters. However, there are some use cases for running a mysql server on the cluster that do make sense. For example, some applications store their data in a mysql database that only needs to run when the application runs. Most instructions for installing mysql involve creating a persistent server and require admin privileges. The instructions that follow walk you through the process of running a mysql server using Apptainer on a cluster compute node without any special privileges. It uses an Apptainer container developed by Robert Grandin at Iowa State (Thanks!) All of the following must be done on an allocated compute node. Do not do this on the login node! Step 1: Create an installation directory somewhere, and cd to it mkdir ~/project/mysql cd ~/project/mysql Step 2: Create two config files Put the following in ~/.my.cnf. Note that you should change the password in both files to something else. [mysqld] innodb_use_native_aio=0 init-file=${HOME}/.mysqlrootpw [client] user=root password='my-secret-pw' Put the following in ~/.mysqlrootpw SET PASSWORD FOR 'root'@'localhost' = PASSWORD('my-secret-pw'); Step 3: Create data directories for mysql mkdir -p ${PWD}/mysql/var/lib/mysql ${PWD}/mysql/run/mysqld Step 4: Make a link to the mysql image file The mysqld image file can be found under the apps tree on each cluster. For example, on Grace: /vast/palmer/apps/apptainer/images/mysqld-5.7.21.simg We recommend that you make a link to it in your mysql directory: ln -s /vast/palmer/apps/apptainer/images/mysqld-5.7.21.simg mysql.simg Step 5: Start the container. Note that this doesn't actually start the service yet. apptainer instance start --bind ${HOME} \\ --bind ${PWD}/mysql/var/lib/mysql/:/var/lib/mysql \\ --bind ${PWD}/mysql/run/mysqld:/run/mysqld \\ ./mysql.simg mysql To check that it is running: apptainer instance list Step 6: Start the mysqld server within the container apptainer run instance://mysql You'll see lots of output, but at the end you should see a message like this 2022-02-21T17:16:21.104527Z 0 [Note] mysqld: ready for connections. Version: '5.7.21' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server (GPL) Step 7: Enter the running container apptainer exec instance://mysql /bin/bash Connect locally as root user while in the container, using the password you set in the config files in step 2. Singularity> mysql -u root -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \\g. Your MySQL connection id is 3 Server version: 5.7.21 MySQL Community Server (GPL) Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\\h' for help. Type '\\c' to clear the current input statement. mysql> Success! The server is working! Type exit to get out of mysql, but remain in the container: Step 8: Add a database user and permit it to login remotely Next, in order to connect from outside the container, you need to add a user that is allowed to connect remotely and give that user permissions. This is one way to do that from the container shell. You should probably substitute your name for elmerfudd and a better password for mypasswd! Singularity> mysql -e \"GRANT ALL PRIVILEGES ON *.* TO 'elmerfudd'@'%' IDENTIFIED BY 'mypasswd' WITH GRANT OPTION\" Singularity> mysql -e \"FLUSH PRIVILEGES\" Type exit to leave the container. From that compute node, but outside the container, try connecting with: mysql -u elmerfudd -h 127.0.0.1 -p Now try connecting to that server from a different compute node by using the hostname of the node where the server is running (e.g. c22n01) instead of 127.0.0.1 mysql -u elmerfudd -h c22n01 -p While connected, you can try actually using the server in the usual way to create a database and table: MySQL [(none)]> create database rob; Query OK, 1 row affected (0.00 sec) MySQL [(none)]> use rob Database changed MySQL [rob]> create table users (name VARCHAR(20), id INT); Query OK, 0 rows affected (0.11 sec) ... Success! You've earned a reward of your choice! Step 9 Shut the container down. apptainer instance stop mysql Now that everything is installed, the next time you want to start the server, you'll only need to do steps 5 (starting the container) and 6 (starting the mysql server). Note that you'll run into a problem if two mysql instances are run on the same compute node, since by default they each try to use port 3306. The simplest solution is to specify a non-standard port in your .my.cnf file: [mysqld] port=3310 innodb_use_native_aio=0 init-file=${HOME}/.mysqlrootpw [client] port=3310 user=root password='my-secret-pw'","title":"Mysql"},{"location":"clusters-at-yale/guides/mysql/#mysql","text":"Mysql is a popular relational database. Because a database is usually thought of as a persistent service, it is not ordinarily run on HPC clusters, since allocations on an HPC cluster are temporary. If you need a persistent mysql database server, we recommend either installing mysql on a server in your lab, or using ITS's Spinup service. In either case, the mysql server can be accessed remotely from the HPC clusters. However, there are some use cases for running a mysql server on the cluster that do make sense. For example, some applications store their data in a mysql database that only needs to run when the application runs. Most instructions for installing mysql involve creating a persistent server and require admin privileges. The instructions that follow walk you through the process of running a mysql server using Apptainer on a cluster compute node without any special privileges. It uses an Apptainer container developed by Robert Grandin at Iowa State (Thanks!) All of the following must be done on an allocated compute node. Do not do this on the login node!","title":"Mysql"},{"location":"clusters-at-yale/guides/mysql/#step-1-create-an-installation-directory-somewhere-and-cd-to-it","text":"mkdir ~/project/mysql cd ~/project/mysql","title":"Step 1: Create an installation directory somewhere, and cd to it"},{"location":"clusters-at-yale/guides/mysql/#step-2-create-two-config-files","text":"Put the following in ~/.my.cnf. Note that you should change the password in both files to something else. [mysqld] innodb_use_native_aio=0 init-file=${HOME}/.mysqlrootpw [client] user=root password='my-secret-pw' Put the following in ~/.mysqlrootpw SET PASSWORD FOR 'root'@'localhost' = PASSWORD('my-secret-pw');","title":"Step 2: Create two config files"},{"location":"clusters-at-yale/guides/mysql/#step-3-create-data-directories-for-mysql","text":"mkdir -p ${PWD}/mysql/var/lib/mysql ${PWD}/mysql/run/mysqld","title":"Step 3: Create data directories for mysql"},{"location":"clusters-at-yale/guides/mysql/#step-4-make-a-link-to-the-mysql-image-file","text":"The mysqld image file can be found under the apps tree on each cluster. For example, on Grace: /vast/palmer/apps/apptainer/images/mysqld-5.7.21.simg We recommend that you make a link to it in your mysql directory: ln -s /vast/palmer/apps/apptainer/images/mysqld-5.7.21.simg mysql.simg","title":"Step 4: Make a link to the mysql image file"},{"location":"clusters-at-yale/guides/mysql/#step-5-start-the-container-note-that-this-doesnt-actually-start-the-service-yet","text":"apptainer instance start --bind ${HOME} \\ --bind ${PWD}/mysql/var/lib/mysql/:/var/lib/mysql \\ --bind ${PWD}/mysql/run/mysqld:/run/mysqld \\ ./mysql.simg mysql To check that it is running: apptainer instance list","title":"Step 5: Start the container. Note that this doesn't actually start the service yet."},{"location":"clusters-at-yale/guides/mysql/#step-6-start-the-mysqld-server-within-the-container","text":"apptainer run instance://mysql You'll see lots of output, but at the end you should see a message like this 2022-02-21T17:16:21.104527Z 0 [Note] mysqld: ready for connections. Version: '5.7.21' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server (GPL)","title":"Step 6: Start the mysqld server within the container"},{"location":"clusters-at-yale/guides/mysql/#step-7-enter-the-running-container","text":"apptainer exec instance://mysql /bin/bash Connect locally as root user while in the container, using the password you set in the config files in step 2. Singularity> mysql -u root -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \\g. Your MySQL connection id is 3 Server version: 5.7.21 MySQL Community Server (GPL) Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\\h' for help. Type '\\c' to clear the current input statement. mysql> Success! The server is working! Type exit to get out of mysql, but remain in the container:","title":"Step 7: Enter the running container"},{"location":"clusters-at-yale/guides/mysql/#step-8-add-a-database-user-and-permit-it-to-login-remotely","text":"Next, in order to connect from outside the container, you need to add a user that is allowed to connect remotely and give that user permissions. This is one way to do that from the container shell. You should probably substitute your name for elmerfudd and a better password for mypasswd! Singularity> mysql -e \"GRANT ALL PRIVILEGES ON *.* TO 'elmerfudd'@'%' IDENTIFIED BY 'mypasswd' WITH GRANT OPTION\" Singularity> mysql -e \"FLUSH PRIVILEGES\" Type exit to leave the container. From that compute node, but outside the container, try connecting with: mysql -u elmerfudd -h 127.0.0.1 -p Now try connecting to that server from a different compute node by using the hostname of the node where the server is running (e.g. c22n01) instead of 127.0.0.1 mysql -u elmerfudd -h c22n01 -p While connected, you can try actually using the server in the usual way to create a database and table: MySQL [(none)]> create database rob; Query OK, 1 row affected (0.00 sec) MySQL [(none)]> use rob Database changed MySQL [rob]> create table users (name VARCHAR(20), id INT); Query OK, 0 rows affected (0.11 sec) ... Success! You've earned a reward of your choice!","title":"Step 8: Add a database user and permit it to login remotely"},{"location":"clusters-at-yale/guides/mysql/#step-9-shut-the-container-down","text":"apptainer instance stop mysql Now that everything is installed, the next time you want to start the server, you'll only need to do steps 5 (starting the container) and 6 (starting the mysql server). Note that you'll run into a problem if two mysql instances are run on the same compute node, since by default they each try to use port 3306. The simplest solution is to specify a non-standard port in your .my.cnf file: [mysqld] port=3310 innodb_use_native_aio=0 init-file=${HOME}/.mysqlrootpw [client] port=3310 user=root password='my-secret-pw'","title":"Step 9 Shut the container down."},{"location":"clusters-at-yale/guides/namd/","text":"NAMD NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD scales to hundreds of cores for typical simulations. NAMD uses the popular molecular graphics program VMD , for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR.To see a full list of available versions of NAMD on the cluster, run: module avail namd/ As of this writing, the latest installed version is 2.13. Running NAMD on the Cluster To set up NAMD on the cluster, module load NAMD/2.13-multicore for the standard multicore version, or module load NAMD/2.13-multicore-CUDA for the GPU-enabled version (about which there is more information below). NAMD can be run interactively, or as a batch job. To run NAMD interactively, you need to create an interactive session on a compute node. You could start an interactive session using 4 cores for 4 hours using salloc --x11 -c 4 -t 4 :00:00 For longer simulations, you will generally want to run non-interactively via a batch job . Parallelization NAMD is most effective when run with parallelization. For running on a single node, namd2 +p ${ SLURM_CPUS_PER_TASK } YourConfigfile where ${SLURM_CPUS_PER_TASK} is set by your \"-c\" job resource request. NAMD uses charm++ parallel objects for multinode parallelization and the program launch uses the charmrun interface. Setting up a multinode run in a way that provides improved performance can be a complicated undertaking. If you wish to run a multinode NAMD job and are not already familiar with MPI, feel free to contact the YCRC staff for assistance. GPUs To use the GPU-accelerated version, request GPU resources for your SLURM job using salloc or via a submission script, and load a CUDA-enabled version of NAMD: module load NAMD/2.13-multicore-CUDA For a single-node run, you will need at least one thread for each GPU you want to use: #SBATCH -c 4 --gpus=4 ... charmrun ++local namd2 +p ${ SLURM_CPUS_PER_TASK } YourConfigfile","title":"NAMD"},{"location":"clusters-at-yale/guides/namd/#namd","text":"NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD scales to hundreds of cores for typical simulations. NAMD uses the popular molecular graphics program VMD , for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR.To see a full list of available versions of NAMD on the cluster, run: module avail namd/ As of this writing, the latest installed version is 2.13.","title":"NAMD"},{"location":"clusters-at-yale/guides/namd/#running-namd-on-the-cluster","text":"To set up NAMD on the cluster, module load NAMD/2.13-multicore for the standard multicore version, or module load NAMD/2.13-multicore-CUDA for the GPU-enabled version (about which there is more information below). NAMD can be run interactively, or as a batch job. To run NAMD interactively, you need to create an interactive session on a compute node. You could start an interactive session using 4 cores for 4 hours using salloc --x11 -c 4 -t 4 :00:00 For longer simulations, you will generally want to run non-interactively via a batch job .","title":"Running NAMD on the Cluster"},{"location":"clusters-at-yale/guides/namd/#parallelization","text":"NAMD is most effective when run with parallelization. For running on a single node, namd2 +p ${ SLURM_CPUS_PER_TASK } YourConfigfile where ${SLURM_CPUS_PER_TASK} is set by your \"-c\" job resource request. NAMD uses charm++ parallel objects for multinode parallelization and the program launch uses the charmrun interface. Setting up a multinode run in a way that provides improved performance can be a complicated undertaking. If you wish to run a multinode NAMD job and are not already familiar with MPI, feel free to contact the YCRC staff for assistance.","title":"Parallelization"},{"location":"clusters-at-yale/guides/namd/#gpus","text":"To use the GPU-accelerated version, request GPU resources for your SLURM job using salloc or via a submission script, and load a CUDA-enabled version of NAMD: module load NAMD/2.13-multicore-CUDA For a single-node run, you will need at least one thread for each GPU you want to use: #SBATCH -c 4 --gpus=4 ... charmrun ++local namd2 +p ${ SLURM_CPUS_PER_TASK } YourConfigfile","title":"GPUs"},{"location":"clusters-at-yale/guides/parallel/","text":"Parallel GNU Parallel a simple but powerful way to run independent tasks in parallel. Although it is possible to run on multiple nodes, it is simplest to run on multiple cpus of a single node, and that is what we will consider here. Note that what is presented here just scratches the surface of what parallel can do. Basic Examples Loop Let's parallelize the following bash loop that prints the letters a through f using bash's brace expansion : for letter in { a..f } ; do echo $letter done ... which produces the following output: a b c d e f To achieve the same result, parallel starts some number of workers and then runs tasks on them. The number of workers and tasks need not be the same. You specify the number of workers with -j . The tasks can be generated with a list of arguments specified after the separator ::: . For parallel to perform well, you should allocate at least the same number of CPUs as workers with the slurm option --cpus-per-task or more simply -c . salloc -c 4 module load parallel parallel -j 4 \"echo {}\" ::: { a..f } This runs four workers that each run echo , filling in the argument {} with the next item in the list. This produces the output: Nested Loop Let's parallelize the following nested bash loop. for letter in { a..c } do for number in { 1 ..7..2 } do echo $letter $number done done ... which produces the following output: a 1 a 2 a 3 b 1 b 2 b 3 c 1 c 2 c 3 You can use the ::: separator with parallel to specify multiple lists of parameters you would like to iterate over. Then you can refer to them by one-based index, e.g. list one is {1} . Using these, you can ask parallel to execute combinations of parameters. Here is a way to recreate the result of the serial bash loop above: parallel -j 4 \"echo {1} {2}\" ::: { a..c } ::: { 1 ..3 } Advanced Examples md5sum You have a number of files scattered throughout a directory tree. Their names end with fastq.gz, e.g. d1/d3/sample3.fastq.gz. You'd like to run md5sum on each, and put the output in a file in the same directory, with a filename ending with .md5sum, e.g. d1/d3/sample3.md5sum. Here is a script that will do that in parallel, using 16 cpus on one node of the cluster: #!/bin/bash #SBATCH -c 16 module load parallel parallel -j ${ SLURM_CPUS_PER_TASK } --plus \"echo {}; md5sum {} > {/fastq.gz/md5sum.new}\" ::: $( find . -name \"*.fastq.gz\" -print ) The $(find . -name \"*.fastq.gz\" -print) portion of the command returns all of the files of interest. They will be plugged into the {} in the md5sum command. {/fastq.gz/md5sum.new} does a string replacement on the filename, producing the desired output filename. String replacement requires the --plus flag to parallel, which enables a number of powerful string manipulation features. Finally, we pass -j ${SLURM_CPUS_PER_TASK} so that parallel will use all of the allocated cpus, however many there are. Parameter Sweep You want to run a simulation program that takes a number of input parameters, and you want to sample a variety of values for each parameter. #!/bin/bash #SBATCH -c 16 module load parallel parallel -j ${ SLURM_CPUS_PER_TASK } simulate { 1 } { 2 } { 3 } ::: { 1 ..5 } ::: 2 16 ::: { 5 ..50..5 } This will run 100 jobs, each with parameters that vary as : simulate 1 2 5 simulate 1 2 10 simulate 1 2 15 ... simulate 5 16 45 simulate 5 16 50 If simulate doesn't create unique output based on parameters, you can use redirection so you can review results from each task. You'll need to use quotes so that the > is seen as part of the command: parallel -j ${ SLURM_CPUS_PER_TASK } \"simulate {1} {2} {3} > results_{1}_{2}_{3}.out\" ::: $( seq 1 5 ) ::: 2 16 ::: $( seq 5 5 50 )","title":"Parallel"},{"location":"clusters-at-yale/guides/parallel/#parallel","text":"GNU Parallel a simple but powerful way to run independent tasks in parallel. Although it is possible to run on multiple nodes, it is simplest to run on multiple cpus of a single node, and that is what we will consider here. Note that what is presented here just scratches the surface of what parallel can do.","title":"Parallel"},{"location":"clusters-at-yale/guides/parallel/#basic-examples","text":"","title":"Basic Examples"},{"location":"clusters-at-yale/guides/parallel/#loop","text":"Let's parallelize the following bash loop that prints the letters a through f using bash's brace expansion : for letter in { a..f } ; do echo $letter done ... which produces the following output: a b c d e f To achieve the same result, parallel starts some number of workers and then runs tasks on them. The number of workers and tasks need not be the same. You specify the number of workers with -j . The tasks can be generated with a list of arguments specified after the separator ::: . For parallel to perform well, you should allocate at least the same number of CPUs as workers with the slurm option --cpus-per-task or more simply -c . salloc -c 4 module load parallel parallel -j 4 \"echo {}\" ::: { a..f } This runs four workers that each run echo , filling in the argument {} with the next item in the list. This produces the output:","title":"Loop"},{"location":"clusters-at-yale/guides/parallel/#nested-loop","text":"Let's parallelize the following nested bash loop. for letter in { a..c } do for number in { 1 ..7..2 } do echo $letter $number done done ... which produces the following output: a 1 a 2 a 3 b 1 b 2 b 3 c 1 c 2 c 3 You can use the ::: separator with parallel to specify multiple lists of parameters you would like to iterate over. Then you can refer to them by one-based index, e.g. list one is {1} . Using these, you can ask parallel to execute combinations of parameters. Here is a way to recreate the result of the serial bash loop above: parallel -j 4 \"echo {1} {2}\" ::: { a..c } ::: { 1 ..3 }","title":"Nested Loop"},{"location":"clusters-at-yale/guides/parallel/#advanced-examples","text":"","title":"Advanced Examples"},{"location":"clusters-at-yale/guides/parallel/#md5sum","text":"You have a number of files scattered throughout a directory tree. Their names end with fastq.gz, e.g. d1/d3/sample3.fastq.gz. You'd like to run md5sum on each, and put the output in a file in the same directory, with a filename ending with .md5sum, e.g. d1/d3/sample3.md5sum. Here is a script that will do that in parallel, using 16 cpus on one node of the cluster: #!/bin/bash #SBATCH -c 16 module load parallel parallel -j ${ SLURM_CPUS_PER_TASK } --plus \"echo {}; md5sum {} > {/fastq.gz/md5sum.new}\" ::: $( find . -name \"*.fastq.gz\" -print ) The $(find . -name \"*.fastq.gz\" -print) portion of the command returns all of the files of interest. They will be plugged into the {} in the md5sum command. {/fastq.gz/md5sum.new} does a string replacement on the filename, producing the desired output filename. String replacement requires the --plus flag to parallel, which enables a number of powerful string manipulation features. Finally, we pass -j ${SLURM_CPUS_PER_TASK} so that parallel will use all of the allocated cpus, however many there are.","title":"md5sum"},{"location":"clusters-at-yale/guides/parallel/#parameter-sweep","text":"You want to run a simulation program that takes a number of input parameters, and you want to sample a variety of values for each parameter. #!/bin/bash #SBATCH -c 16 module load parallel parallel -j ${ SLURM_CPUS_PER_TASK } simulate { 1 } { 2 } { 3 } ::: { 1 ..5 } ::: 2 16 ::: { 5 ..50..5 } This will run 100 jobs, each with parameters that vary as : simulate 1 2 5 simulate 1 2 10 simulate 1 2 15 ... simulate 5 16 45 simulate 5 16 50 If simulate doesn't create unique output based on parameters, you can use redirection so you can review results from each task. You'll need to use quotes so that the > is seen as part of the command: parallel -j ${ SLURM_CPUS_PER_TASK } \"simulate {1} {2} {3} > results_{1}_{2}_{3}.out\" ::: $( seq 1 5 ) ::: 2 16 ::: $( seq 5 5 50 )","title":"Parameter Sweep"},{"location":"clusters-at-yale/guides/python/","text":"Python Python is a language and free software distribution that is used for websites, system administration, security testing, and scientific computing, to name a few. On the Yale Clusters there are a couple ways in which you can set up Python environments. The default python provided is the minimal install of Python 2.7 that comes with Red Hat Enterprise Linux 7. We strongly recommend that you use one of the methods below to set up your own python environment. The Python Module We provide a Python as a software module . We include frozen versions of many common packages used for scientific computing. Find and Load Python Find the available versions of Python version 3 with: module avail Python/3 To load version 3.7.0: module load Python/3.7.0-foss-2018b To show installed Python packages and their versions for the Python/3.7.0-foss-2018b module: module help Python/3.7.0-foss-2018b Install Packages We recommend against installing python packages with pip after having loaded the Python module. Doing so installs them to your home directory in a way that does not make it clear to other python installs what environment the packages you installed belong to. Instead we recommend using virtualenv or Conda environments. We like conda because of all the additional pre-compiled software it makes available. Warning Grace's login nodes have newer architecture than the oldest nodes on the cluster. If you do pip install packages, do so in an interactive job submitted with the -C oldest Slurm flag if you want to ensure your code will work on all generations of the compute nodes. Conda-based Python Environments You can easily set up multiple Python installations side-by-side using the conda command. With Conda you can manage your own packages and dependencies for Python, R, etc. See our guide for more detailed instructions. # install once module load miniconda conda create -n py3_env python = 3 numpy scipy matplotlib ipython jupyter jupyterlab # use later module purge && module load miniconda conda activate py3_env Run Python We will kill Python jobs on the login nodes that are using excessive resources. To be a good cluster citizen, launch your computation in jobs. See our Slurm documentation for more detailed information on submitting jobs. Interactive Job To run Python interactively, first launch an interactive job on a compute node. If your Python sessions will need up to 10 GiB of RAM and up to 4 hours, you would submit you job with: salloc --mem = 10G -t 4 :00:00 Once your interactive session starts, you can load the appropriate module or Conda environment (see above) and start python or ipython on your command prompt. If you are happy with your Python commands, save them to a file which can then be submitted and run as a batch job. Batch Mode To run Python in batch mode, create a plain-text batch script to submit. In that script, you call your Python script. In this case myscript.py is in the same directory as the batch script, batch script contents shown below. #!/bin/bash #SBATCH -J my_python_program #SBATCH --mem=10G #SBATCH -t 4:00:00 module load miniconda conda activate py3_env python myscript.py To actually submit the job, run sbatch my_py_job.sh where the batch script above was saved as my_py_job.sh . Jupyter Notebooks You can run Jupyter notebooks & JupyterLab by submitting your notebook server as a job. See our page dedicated to Jupyter for more info.","title":"Python"},{"location":"clusters-at-yale/guides/python/#python","text":"Python is a language and free software distribution that is used for websites, system administration, security testing, and scientific computing, to name a few. On the Yale Clusters there are a couple ways in which you can set up Python environments. The default python provided is the minimal install of Python 2.7 that comes with Red Hat Enterprise Linux 7. We strongly recommend that you use one of the methods below to set up your own python environment.","title":"Python"},{"location":"clusters-at-yale/guides/python/#the-python-module","text":"We provide a Python as a software module . We include frozen versions of many common packages used for scientific computing.","title":"The Python Module"},{"location":"clusters-at-yale/guides/python/#find-and-load-python","text":"Find the available versions of Python version 3 with: module avail Python/3 To load version 3.7.0: module load Python/3.7.0-foss-2018b To show installed Python packages and their versions for the Python/3.7.0-foss-2018b module: module help Python/3.7.0-foss-2018b","title":"Find and Load Python"},{"location":"clusters-at-yale/guides/python/#install-packages","text":"We recommend against installing python packages with pip after having loaded the Python module. Doing so installs them to your home directory in a way that does not make it clear to other python installs what environment the packages you installed belong to. Instead we recommend using virtualenv or Conda environments. We like conda because of all the additional pre-compiled software it makes available. Warning Grace's login nodes have newer architecture than the oldest nodes on the cluster. If you do pip install packages, do so in an interactive job submitted with the -C oldest Slurm flag if you want to ensure your code will work on all generations of the compute nodes.","title":"Install Packages"},{"location":"clusters-at-yale/guides/python/#conda-based-python-environments","text":"You can easily set up multiple Python installations side-by-side using the conda command. With Conda you can manage your own packages and dependencies for Python, R, etc. See our guide for more detailed instructions. # install once module load miniconda conda create -n py3_env python = 3 numpy scipy matplotlib ipython jupyter jupyterlab # use later module purge && module load miniconda conda activate py3_env","title":"Conda-based Python Environments"},{"location":"clusters-at-yale/guides/python/#run-python","text":"We will kill Python jobs on the login nodes that are using excessive resources. To be a good cluster citizen, launch your computation in jobs. See our Slurm documentation for more detailed information on submitting jobs.","title":"Run Python"},{"location":"clusters-at-yale/guides/python/#interactive-job","text":"To run Python interactively, first launch an interactive job on a compute node. If your Python sessions will need up to 10 GiB of RAM and up to 4 hours, you would submit you job with: salloc --mem = 10G -t 4 :00:00 Once your interactive session starts, you can load the appropriate module or Conda environment (see above) and start python or ipython on your command prompt. If you are happy with your Python commands, save them to a file which can then be submitted and run as a batch job.","title":"Interactive Job"},{"location":"clusters-at-yale/guides/python/#batch-mode","text":"To run Python in batch mode, create a plain-text batch script to submit. In that script, you call your Python script. In this case myscript.py is in the same directory as the batch script, batch script contents shown below. #!/bin/bash #SBATCH -J my_python_program #SBATCH --mem=10G #SBATCH -t 4:00:00 module load miniconda conda activate py3_env python myscript.py To actually submit the job, run sbatch my_py_job.sh where the batch script above was saved as my_py_job.sh .","title":"Batch Mode"},{"location":"clusters-at-yale/guides/python/#jupyter-notebooks","text":"You can run Jupyter notebooks & JupyterLab by submitting your notebook server as a job. See our page dedicated to Jupyter for more info.","title":"Jupyter Notebooks"},{"location":"clusters-at-yale/guides/r/","text":"R R is a free software environment for statistical computing and graphics. On the Yale Clusters there are a couple ways in which you can set up your R environment. There is no R executable provided by default; you have to choose one of the following methods to be able to run R. The R Module We provide several versions of R as software modules . These modules provide a broad selection of commonly used packages pre-installed. Notably, this includes a number of geospatial packages like sf , sp , raster , and terra . In addition, we install a collection of the most common bioconductor bioinformatics packages ( homepage ) called R-bundle-Bioconductor . This can be loaded in addition to the matching R module to provide simple access to these tools. Find and Load R Find the available versions of R version 4 with: module avail R/4 To load version 4.2.0: module load R/4.2.0-foss-2020b To show installed R packages and their versions for the R/4.2.0 module: module help R/4.2.0-foss-2020b Between the base R module and the R-bundle-Bioconductor module, there are over 1000 R packages installed and ready to use. To find if your desired package is available in these modules, you can run module spider $PACKAGE/$VERSION : module spider Seurat/4.1.1 -------------------------------------------------------------------------------------------------------------------------------------------------------- Seurat: Seurat/4.1.1 ( E ) -------------------------------------------------------------------------------------------------------------------------------------------------------- This extension is provided by the following modules. To access the extension you must load one of the following modules. Note that any module names in parentheses show the module location in the software hierarchy. R-bundle-Bioconductor/3.15-foss-2020b-R-4.2.0 Names marked by a trailing ( E ) are extensions provided by another module. So to get this version of Seurat, you can load the R-bundle-Bioconductor module. Then you simple library(Seurat) to use that tool. Install Packages The software modules include many commonly used packages, but you can install additional packages specifically for your account. As part of the R software modules we define an environment variable which directs R to install packages to your project space. This helps prevent issues where R cannot install packages due to home-space quotas. To change the location of where R installs packages, the R_LIBS_USER variable can be set in your ~/.bashrc file: export R_LIBS_USER=$GIBBS_PROJECT/R/%v where %v is a placeholder for the R major and minor version number (e.g. 4.2 ). R will replace this variable with the correct value automatically to segregate packages installed with different versions of R. We recommend you install packages in an interactive job with the slurm option -C oldest . This will ensure the compiled portions of your R library are compatible with all compute nodes on the cluster. If there is a missing library your package of interest needs you should be able to load its module. If you cannot find a dependency or have trouble installing an R package, please get in touch with us . Warning Grace's login nodes have newer architecture than the oldest nodes on the cluster. Always install packages in an interactive job submitted with the -C oldest Slurm flag if you want to ensure your code will work on all generations of the compute nodes. To get started load the R module and start R: salloc module load R/4.2.0-foss-2020b R # in R > install.packages ( \"lattice\" , repos = \"http://cran.r-project.org\" ) This will throw a warning like: Warning in install.packages ( \"lattice\" ) : 'lib = \"/ysm-gpfs/apps/software/R/4.2.0-foss-2020b/lib64/R/library\"' is not writable Would you like to create a personal library /gpfs/gibbs/project/support/tl397/R/4.1 to install packages into? ( y/n ) Note If you encounter a permission error because the installation does not prompt you to create a personal library, create the directory in the default location in your home directory for the version of R you are using; e.g., mkdir /path/to/your/project/space/R/4.2 You only need the general minor version such as 4.2 instead of 4.2.2. You can customize where packages are installed and accessed for a particular R session using the .libPaths function in R: # List current package locations > .libPaths() # Add new default location to the standard defaults, e.g. project/my_R_libs > .libPaths(c(\"/home/netID/project/my_R_libs/\", .libPaths())) Run R We will kill R jobs on the login nodes that are using excessive resources. To be a good cluster citizen, launch your R computation in jobs. See our Slurm documentation for more detailed information on submitting jobs. Interactive Job To run R interactively, first launch an interactive job on a compute node. If your R sessions will need up to 10 GiB of RAM and up to 4 hours, you would submit you job with: salloc --mem = 10G -t 4 :00:00 Once your interactive session starts, you can load the appropriate module or Conda environment (see above) and start R by entering R on your command prompt. If you are happy with your R commands, save them to a file which can then be submitted and run as a batch job. Batch Mode To run R in batch mode, create a plain-text batch script to submit. In that script, you can run your R script. In this case myscript.R is in the same directory as the batch script, batch script contents shown below. #!/bin/bash #SBATCH -J my_r_program #SBATCH --mem=10G #SBATCH -t 4:00:00 module load R/4.1.0-foss-2020b Rscript myscript.R To actually submit the job, run sbatch my_r_job.sh where the batch script above was saved as my_r_job.sh . RStudio You can run RStudio app via Open Ondemand . Here you can select the desired version of R and RStudio and launch an interactive compute session. Parallel R On a cluster you may want to use R in parallel across multiple nodes. While there are a few different ways this can be achieved, we recommend using the R software module which already includes Rmpi , parallel , and doMC . To test it, we can create a simple R script named ex1.R library ( \"Rmpi\" ) n <- mpi.comm.size ( 0 ) me <- mpi.comm.rank ( 0 ) mpi.barrier ( 0 ) val <- 777 mpi.bcast ( val , 1 , 0 , 0 ) print ( paste ( \"me\" , me , \"val\" , val )) mpi.barrier ( 0 ) mpi.quit () Then we can launch it with an sbatch script ( ex1.sh ): #!/bin/bash #SBATCH -n 4 #SBATCH -t 5:00 module purge module load R/4.1.0-foss-2020b srun Rscript ex1.R This script should execute a simple broadcast and complete in a few seconds. Virtual Display Session It is common for R to require a display session to save certain types of figures. You may see a warning like \"unable to start device PNG\" or \"unable to open connection to X11 display\". There is a tool, xvfb , which can help avoid these issues. The wrapper xvfb-run creates a virtual display session which allows R to create these figures without an X11 session. See the guide for xvfb for more details. Conda-based R Environments If there isn't a module available for the version of R you want, you can set up your own R installation using Conda . With Conda you can manage your own packages and dependencies, for R, Python, etc. Most of the time the best way to install R packages for your Conda R environment is via conda . # load miniconda module load miniconda # create the conda environment including r-base and r-essentials package collections conda create --name my_r_env r-base r-essentials # activate the environment conda activate my_r_env # Install the lattice package (r-lattice) conda install r-lattice If there are packages that conda does not provide, you can install using the install.packages function, but this may occasionally not work as well. When you install packages with install.packages make sure to activate your Conda environment first. salloc module load miniconda source activate my_r_env R # in R > install.packages ( \"lattice\" , repos = \"http://cran.r-project.org\" ) Warning Conda-based R may not work properly with parallel packages like Rmpi when running across multiple compute nodes. In general, it's best to use the module installation of R for anything which requires MPI.","title":"R"},{"location":"clusters-at-yale/guides/r/#r","text":"R is a free software environment for statistical computing and graphics. On the Yale Clusters there are a couple ways in which you can set up your R environment. There is no R executable provided by default; you have to choose one of the following methods to be able to run R.","title":"R"},{"location":"clusters-at-yale/guides/r/#the-r-module","text":"We provide several versions of R as software modules . These modules provide a broad selection of commonly used packages pre-installed. Notably, this includes a number of geospatial packages like sf , sp , raster , and terra . In addition, we install a collection of the most common bioconductor bioinformatics packages ( homepage ) called R-bundle-Bioconductor . This can be loaded in addition to the matching R module to provide simple access to these tools.","title":"The R Module"},{"location":"clusters-at-yale/guides/r/#find-and-load-r","text":"Find the available versions of R version 4 with: module avail R/4 To load version 4.2.0: module load R/4.2.0-foss-2020b To show installed R packages and their versions for the R/4.2.0 module: module help R/4.2.0-foss-2020b Between the base R module and the R-bundle-Bioconductor module, there are over 1000 R packages installed and ready to use. To find if your desired package is available in these modules, you can run module spider $PACKAGE/$VERSION : module spider Seurat/4.1.1 -------------------------------------------------------------------------------------------------------------------------------------------------------- Seurat: Seurat/4.1.1 ( E ) -------------------------------------------------------------------------------------------------------------------------------------------------------- This extension is provided by the following modules. To access the extension you must load one of the following modules. Note that any module names in parentheses show the module location in the software hierarchy. R-bundle-Bioconductor/3.15-foss-2020b-R-4.2.0 Names marked by a trailing ( E ) are extensions provided by another module. So to get this version of Seurat, you can load the R-bundle-Bioconductor module. Then you simple library(Seurat) to use that tool.","title":"Find and Load R"},{"location":"clusters-at-yale/guides/r/#install-packages","text":"The software modules include many commonly used packages, but you can install additional packages specifically for your account. As part of the R software modules we define an environment variable which directs R to install packages to your project space. This helps prevent issues where R cannot install packages due to home-space quotas. To change the location of where R installs packages, the R_LIBS_USER variable can be set in your ~/.bashrc file: export R_LIBS_USER=$GIBBS_PROJECT/R/%v where %v is a placeholder for the R major and minor version number (e.g. 4.2 ). R will replace this variable with the correct value automatically to segregate packages installed with different versions of R. We recommend you install packages in an interactive job with the slurm option -C oldest . This will ensure the compiled portions of your R library are compatible with all compute nodes on the cluster. If there is a missing library your package of interest needs you should be able to load its module. If you cannot find a dependency or have trouble installing an R package, please get in touch with us . Warning Grace's login nodes have newer architecture than the oldest nodes on the cluster. Always install packages in an interactive job submitted with the -C oldest Slurm flag if you want to ensure your code will work on all generations of the compute nodes. To get started load the R module and start R: salloc module load R/4.2.0-foss-2020b R # in R > install.packages ( \"lattice\" , repos = \"http://cran.r-project.org\" ) This will throw a warning like: Warning in install.packages ( \"lattice\" ) : 'lib = \"/ysm-gpfs/apps/software/R/4.2.0-foss-2020b/lib64/R/library\"' is not writable Would you like to create a personal library /gpfs/gibbs/project/support/tl397/R/4.1 to install packages into? ( y/n ) Note If you encounter a permission error because the installation does not prompt you to create a personal library, create the directory in the default location in your home directory for the version of R you are using; e.g., mkdir /path/to/your/project/space/R/4.2 You only need the general minor version such as 4.2 instead of 4.2.2. You can customize where packages are installed and accessed for a particular R session using the .libPaths function in R: # List current package locations > .libPaths() # Add new default location to the standard defaults, e.g. project/my_R_libs > .libPaths(c(\"/home/netID/project/my_R_libs/\", .libPaths()))","title":"Install Packages"},{"location":"clusters-at-yale/guides/r/#run-r","text":"We will kill R jobs on the login nodes that are using excessive resources. To be a good cluster citizen, launch your R computation in jobs. See our Slurm documentation for more detailed information on submitting jobs.","title":"Run R"},{"location":"clusters-at-yale/guides/r/#interactive-job","text":"To run R interactively, first launch an interactive job on a compute node. If your R sessions will need up to 10 GiB of RAM and up to 4 hours, you would submit you job with: salloc --mem = 10G -t 4 :00:00 Once your interactive session starts, you can load the appropriate module or Conda environment (see above) and start R by entering R on your command prompt. If you are happy with your R commands, save them to a file which can then be submitted and run as a batch job.","title":"Interactive Job"},{"location":"clusters-at-yale/guides/r/#batch-mode","text":"To run R in batch mode, create a plain-text batch script to submit. In that script, you can run your R script. In this case myscript.R is in the same directory as the batch script, batch script contents shown below. #!/bin/bash #SBATCH -J my_r_program #SBATCH --mem=10G #SBATCH -t 4:00:00 module load R/4.1.0-foss-2020b Rscript myscript.R To actually submit the job, run sbatch my_r_job.sh where the batch script above was saved as my_r_job.sh .","title":"Batch Mode"},{"location":"clusters-at-yale/guides/r/#rstudio","text":"You can run RStudio app via Open Ondemand . Here you can select the desired version of R and RStudio and launch an interactive compute session.","title":"RStudio"},{"location":"clusters-at-yale/guides/r/#parallel-r","text":"On a cluster you may want to use R in parallel across multiple nodes. While there are a few different ways this can be achieved, we recommend using the R software module which already includes Rmpi , parallel , and doMC . To test it, we can create a simple R script named ex1.R library ( \"Rmpi\" ) n <- mpi.comm.size ( 0 ) me <- mpi.comm.rank ( 0 ) mpi.barrier ( 0 ) val <- 777 mpi.bcast ( val , 1 , 0 , 0 ) print ( paste ( \"me\" , me , \"val\" , val )) mpi.barrier ( 0 ) mpi.quit () Then we can launch it with an sbatch script ( ex1.sh ): #!/bin/bash #SBATCH -n 4 #SBATCH -t 5:00 module purge module load R/4.1.0-foss-2020b srun Rscript ex1.R This script should execute a simple broadcast and complete in a few seconds.","title":"Parallel R"},{"location":"clusters-at-yale/guides/r/#virtual-display-session","text":"It is common for R to require a display session to save certain types of figures. You may see a warning like \"unable to start device PNG\" or \"unable to open connection to X11 display\". There is a tool, xvfb , which can help avoid these issues. The wrapper xvfb-run creates a virtual display session which allows R to create these figures without an X11 session. See the guide for xvfb for more details.","title":"Virtual Display Session"},{"location":"clusters-at-yale/guides/r/#conda-based-r-environments","text":"If there isn't a module available for the version of R you want, you can set up your own R installation using Conda . With Conda you can manage your own packages and dependencies, for R, Python, etc. Most of the time the best way to install R packages for your Conda R environment is via conda . # load miniconda module load miniconda # create the conda environment including r-base and r-essentials package collections conda create --name my_r_env r-base r-essentials # activate the environment conda activate my_r_env # Install the lattice package (r-lattice) conda install r-lattice If there are packages that conda does not provide, you can install using the install.packages function, but this may occasionally not work as well. When you install packages with install.packages make sure to activate your Conda environment first. salloc module load miniconda source activate my_r_env R # in R > install.packages ( \"lattice\" , repos = \"http://cran.r-project.org\" ) Warning Conda-based R may not work properly with parallel packages like Rmpi when running across multiple compute nodes. In general, it's best to use the module installation of R for anything which requires MPI.","title":"Conda-based R Environments"},{"location":"clusters-at-yale/guides/rclone/","text":"Rclone rclone is a command line tool to sync files and directories to and from all major cloud storage sites. You can use rclone to sync files and directories between Yale clusters and Yale Box, google drive, etc. The following instructions cover basics to setup and use rclone on Yale clusters. For more information about Rclone, please visit its website at https://rclone.org . Set up Rclone on YCRC clusters Before accessing a remote cloud storage using rclone , you need to run rclone config to configure the storage for rclone . Since rclone config will try to bring up a browser for you to authorize the cloud storage, we recommend you to use Open OnDemand . To run rclone config on OOD, first click Remote Desktop from the OOD dashboard. Once a session starts running, click Connect to Remote Desktop and you will see a terminal on the desktop in the browser. Run rclone config at the command line of the terminal. During configuration, you will see a message similar to the following: If your browser does not open automatically go to the following link: http://127.0.0.1:53682/auth Log in and authorize rclone for access Waiting for code... If no browser started automatically, then start Firefox manually by clicking the Firefox icon on the top bar of the Remote Desktop. Copy the link from the message shown on your screen and paste it to the address bar of Firefox. Log in with your Yale email address, respond to the DUO request, and authorize the access. Tip If you received an error stating that your session has expired for DUO, simply paste the link and reload the page. If you still get the expired message, log out of CAS in your browser by going to https://secure.its.yale.edu/cas/logout. After logging out, paste the link and reload. Examples The following examples show you how to set up rclone for a viriety of different storage types. In the examples, we name our remote cloud storage as 'remote' in the configuration. You can provide any name you want. Google Drive The example below is a screen dump when setting up rclone for Google Drive. [ pl543@c03n06 ~ ] $ rclone config No remotes found - make a new one n ) New remote s ) Set configuration password q ) Quit config n/s/q> n name> remote Type of storage to configure. Enter a string value. Press Enter for the default ( \"\" ) . Choose a number from below, or type in your own value 1 / 1Fichier \\ \"fichier\" 2 / Alias for an existing remote \\ \"alias\" [ ... ] 15 / Google Drive \\ \"drive\" [ ... ] 42 / seafile \\ \"seafile\" Storage> 15 ** See help for drive backend at: https://rclone.org/drive/ ** Google Application Client Id Setting your own is recommended. See https://rclone.org/drive/#making-your-own-client-id for how to create your own. If you leave this blank, it will use an internal key which is low performance. Enter a string value. Press Enter for the default ( \"\" ) . client_id> OAuth Client Secret Leave blank normally. Enter a string value. Press Enter for the default ( \"\" ) . client_secret> Scope that rclone should use when requesting access from drive. Enter a string value. Press Enter for the default ( \"\" ) . Choose a number from below, or type in your own value 1 / Full access all files, excluding Application Data Folder. \\ \"drive\" 2 / Read-only access to file metadata and file contents. \\ \"drive.readonly\" / Access to files created by rclone only. 3 | These are visible in the drive website. | File authorization is revoked when the user deauthorizes the app. \\ \"drive.file\" / Allows read and write access to the Application Data folder. 4 | This is not visible in the drive website. \\ \"drive.appfolder\" / Allows read-only access to file metadata but 5 | does not allow any access to read or download file content. \\ \"drive.metadata.readonly\" scope> 1 ID of the root folder Leave blank normally. Fill in to access \"Computers\" folders ( see docs ) , or for rclone to use a non root folder as its starting point. Enter a string value. Press Enter for the default ( \"\" ) . root_folder_id> Service Account Credentials JSON file path Leave blank normally. Needed only if you want use SA instead of interactive login. Leading ` ~ ` will be expanded in the file name as will environment variables such as ` ${ RCLONE_CONFIG_DIR } ` . Enter a string value. Press Enter for the default ( \"\" ) . service_account_file> Edit advanced config? ( y/n ) y ) Yes n ) No ( default ) y/n> n Remote config Use auto config? * Say Y if not sure * Say N if you are working on a remote or headless machine y ) Yes ( default ) n ) No y/n> y If your browser doesn ' t open automatically go to the following link: http://127.0.0.1:53682/auth?state = 6glRr_mpEORxHevlOaaYyw Log in and authorize rclone for access Waiting for code... Got code Configure this as a Shared Drive ( Team Drive ) ? y ) Yes n ) No ( default ) y/n> n -------------------- [ remote ] type = drive scope = drive token = { \"access_token\" : \"ya29.A0ArdaM-mBYFKBE2gieODvdANCZRV6Y8QHhQF-lY74E9fr1HTLOwwLRuoQQbO9P-Jdip62YYhqXfcuWT0KLKGdhUb9M8g2Z4XEQqoNLwZyA-FA2AAYYBqB\" , \"token_type\" : \"Bearer\" , \"refresh_token\" : \"1//0dDu3r6KVakgYIARAAGA0NwF-L9IrWIuG7_f44x-uLR2vvBocf4q-KnQVhlkm18TO2Fn0GjJp-cArWfj5kY84\" , \"expiry\" : \"2021-02-25T17:28:18.629507046-05 :00\" } -------------------- y ) Yes this is OK ( default ) e ) Edit this remote d ) Delete this remote y/e/d> y Current remotes: Name Type ==== ==== remote drive e ) Edit existing remote n ) New remote d ) Delete remote r ) Rename remote c ) Copy remote s ) Set configuration password q ) Quit config e/n/d/r/c/s/q> q Box The example below is a screen dump when setting up rclone for Yale Box. [ pl543@c14n07 ~ ] $ rclone config No remotes found - make a new one n ) New remote s ) Set configuration password q ) Quit config n/s/q> n name> remote Type of storage to configure. Enter a string value. Press Enter for the default ( \"\" ) . Choose a number from below, or type in your own value 1 / 1Fichier \\ \"fichier\" [ ... ] 6 / Box \\ \"box\" [ ... ] Storage> box ** See help for box backend at: https://rclone.org/box/ ** Box App Client Id. Leave blank normally. Enter a string value. Press Enter for the default ( \"\" ) . client_id> Box App Client Secret Leave blank normally. Enter a string value. Press Enter for the default ( \"\" ) . client_secret> Edit advanced config? ( y/n ) y ) Yes n ) No y/n> n Remote config Use auto config? * Say Y if not sure * Say N if you are working on a remote or headless machine y ) Yes n ) No y/n> y If your browser does not open automatically go to the following link: http://127.0.0.1:53682/auth Log in and authorize rclone for access Waiting for code... Got code -------------------- [ remote ] type = box token = { \"access_token\" : \"PjIXHUZ34VQSmeUZ9r6bhc9ux44KMU0e\" , \"token_type\" : \"bearer\" , \"refresh_token\" : \"VumWPWP5Nd0M2C1GyfgfJL51zUeWPPVLc6VC6lBQduEPsQ9a6ibSor2dvHmyZ6B8\" , \"expiry\" : \"2019-10-21T11:00:36.839586736-04:00\" } -------------------- y ) Yes this is OK e ) Edit this remote d ) Delete this remote y/e/d> y Current remotes: Name Type ==== ==== remote box e ) Edit existing remote n ) New remote d ) Delete remote r ) Rename remote c ) Copy remote s ) Set configuration password q ) Quit config e/n/d/r/c/s/q> q S3 The example below is a screen dump when setting up rclone for an S3 provider such as aws. [ rdb9@login1.mccleary ~ ] $ rclone config Enter configuration password: password: Current remotes: Name Type ==== ==== [ ... ] e ) Edit existing remote n ) New remote d ) Delete remote r ) Rename remote c ) Copy remote s ) Set configuration password q ) Quit config e/n/d/r/c/s/q> n ``` bash Enter name for new remote. name> remote Option Storage. Type of storage to configure. Choose a number from below, or type in your own value. [ ... ] 5 / Amazon S3 Compliant Storage Providers including AWS, Alibaba, Ceph, China Mobile, Cloudflare, ArvanCloud, DigitalOcean, Dreamhost, Huawei OBS, IBM COS, IDrive e2, IONOS Cloud, Liara, Lyve Cloud, Minio, Netease, RackCorp, Scaleway, SeaweedFS, StackPath, Storj, Tencent COS, Qiniu and Wasabi \\ ( s3 ) [ ... ] Storage> 5 Option provider. Choose your S3 provider. Choose a number from below, or type in your own value. Press Enter to leave empty. 1 / Amazon Web Services ( AWS ) S3 \\ ( AWS ) [ ... ] provider> 1 Option env_auth. Get AWS credentials from runtime ( environment variables or EC2/ECS meta data if no env vars ) . Only applies if access_key_id and secret_access_key is blank. Choose a number from below, or type in your own boolean value ( true or false ) . Press Enter for the default ( false ) . 1 / Enter AWS credentials in the next step. \\ ( false ) 2 / Get AWS credentials from the environment ( env vars or IAM ) . \\ ( true ) env_auth> Option access_key_id. AWS Access Key ID. Leave blank for anonymous access or runtime credentials. Enter a value. Press Enter to leave empty. access_key_id> *************** Option secret_access_key. AWS Secret Access Key ( password ) . Leave blank for anonymous access or runtime credentials. Enter a value. Press Enter to leave empty. secret_access_key> ************* Option region. Region to connect to. Choose a number from below, or type in your own value. Press Enter to leave empty. / The default endpoint - a good choice if you are unsure. 1 | US Region, Northern Virginia, or Pacific Northwest. | Leave location constraint empty. \\ ( us-east-1 ) / US East ( Ohio ) Region. [ ... ] [ take defaults for all remaining questions Edit advanced config? y ) Yes n ) No ( default ) y/n> n Configuration complete. Options: - type: s3 - provider: AWS - access_key_id: *************** - secret_access_key: **************** - region: us-east-1 Tip if you want to use rclone for a shared google drive, you should answer 'y' when it asks whether you want to configure it as a Shared Drive. Configure this as a Shared Drive ( Team Drive ) ? y ) Yes n ) No ( default ) y/n> y Tip rclone config creates a file storing cloud storage configurations for rclone. You can check the file name with rclone config file . The config file can be copied to other clusters so that you can use rclone on the other clusters without running rclone config again. Use Rclone on Yale clusters As we have used remote as the name of the cloud storage in our examples above, we will continue using it in the following examples. You should replace it with the actual name you have picked up for the cloud storage in your configuration. Tip If you forgot the name of the cloud storage you have configured, run rclone config show and the name will be shown in [] . $ rclone config show [ remote ] type = drive scope = drive token = { \"access_token\" : \"mytoken\" , \"expiry\" : \"2021-07-09T22:13:56.452750648-04:00\" } root_folder_id = myid List files rclone ls remote:/ Copy files # to download a file to the cluster rclone copy remote:/path/to/filename . # to upload a file from the cluster to the cloud storage rclone copy filename remote:/path/to/ Help rclone help","title":"Rclone"},{"location":"clusters-at-yale/guides/rclone/#rclone","text":"rclone is a command line tool to sync files and directories to and from all major cloud storage sites. You can use rclone to sync files and directories between Yale clusters and Yale Box, google drive, etc. The following instructions cover basics to setup and use rclone on Yale clusters. For more information about Rclone, please visit its website at https://rclone.org .","title":"Rclone"},{"location":"clusters-at-yale/guides/rclone/#set-up-rclone-on-ycrc-clusters","text":"Before accessing a remote cloud storage using rclone , you need to run rclone config to configure the storage for rclone . Since rclone config will try to bring up a browser for you to authorize the cloud storage, we recommend you to use Open OnDemand . To run rclone config on OOD, first click Remote Desktop from the OOD dashboard. Once a session starts running, click Connect to Remote Desktop and you will see a terminal on the desktop in the browser. Run rclone config at the command line of the terminal. During configuration, you will see a message similar to the following: If your browser does not open automatically go to the following link: http://127.0.0.1:53682/auth Log in and authorize rclone for access Waiting for code... If no browser started automatically, then start Firefox manually by clicking the Firefox icon on the top bar of the Remote Desktop. Copy the link from the message shown on your screen and paste it to the address bar of Firefox. Log in with your Yale email address, respond to the DUO request, and authorize the access. Tip If you received an error stating that your session has expired for DUO, simply paste the link and reload the page. If you still get the expired message, log out of CAS in your browser by going to https://secure.its.yale.edu/cas/logout. After logging out, paste the link and reload.","title":"Set up Rclone on YCRC clusters"},{"location":"clusters-at-yale/guides/rclone/#examples","text":"The following examples show you how to set up rclone for a viriety of different storage types. In the examples, we name our remote cloud storage as 'remote' in the configuration. You can provide any name you want. Google Drive The example below is a screen dump when setting up rclone for Google Drive. [ pl543@c03n06 ~ ] $ rclone config No remotes found - make a new one n ) New remote s ) Set configuration password q ) Quit config n/s/q> n name> remote Type of storage to configure. Enter a string value. Press Enter for the default ( \"\" ) . Choose a number from below, or type in your own value 1 / 1Fichier \\ \"fichier\" 2 / Alias for an existing remote \\ \"alias\" [ ... ] 15 / Google Drive \\ \"drive\" [ ... ] 42 / seafile \\ \"seafile\" Storage> 15 ** See help for drive backend at: https://rclone.org/drive/ ** Google Application Client Id Setting your own is recommended. See https://rclone.org/drive/#making-your-own-client-id for how to create your own. If you leave this blank, it will use an internal key which is low performance. Enter a string value. Press Enter for the default ( \"\" ) . client_id> OAuth Client Secret Leave blank normally. Enter a string value. Press Enter for the default ( \"\" ) . client_secret> Scope that rclone should use when requesting access from drive. Enter a string value. Press Enter for the default ( \"\" ) . Choose a number from below, or type in your own value 1 / Full access all files, excluding Application Data Folder. \\ \"drive\" 2 / Read-only access to file metadata and file contents. \\ \"drive.readonly\" / Access to files created by rclone only. 3 | These are visible in the drive website. | File authorization is revoked when the user deauthorizes the app. \\ \"drive.file\" / Allows read and write access to the Application Data folder. 4 | This is not visible in the drive website. \\ \"drive.appfolder\" / Allows read-only access to file metadata but 5 | does not allow any access to read or download file content. \\ \"drive.metadata.readonly\" scope> 1 ID of the root folder Leave blank normally. Fill in to access \"Computers\" folders ( see docs ) , or for rclone to use a non root folder as its starting point. Enter a string value. Press Enter for the default ( \"\" ) . root_folder_id> Service Account Credentials JSON file path Leave blank normally. Needed only if you want use SA instead of interactive login. Leading ` ~ ` will be expanded in the file name as will environment variables such as ` ${ RCLONE_CONFIG_DIR } ` . Enter a string value. Press Enter for the default ( \"\" ) . service_account_file> Edit advanced config? ( y/n ) y ) Yes n ) No ( default ) y/n> n Remote config Use auto config? * Say Y if not sure * Say N if you are working on a remote or headless machine y ) Yes ( default ) n ) No y/n> y If your browser doesn ' t open automatically go to the following link: http://127.0.0.1:53682/auth?state = 6glRr_mpEORxHevlOaaYyw Log in and authorize rclone for access Waiting for code... Got code Configure this as a Shared Drive ( Team Drive ) ? y ) Yes n ) No ( default ) y/n> n -------------------- [ remote ] type = drive scope = drive token = { \"access_token\" : \"ya29.A0ArdaM-mBYFKBE2gieODvdANCZRV6Y8QHhQF-lY74E9fr1HTLOwwLRuoQQbO9P-Jdip62YYhqXfcuWT0KLKGdhUb9M8g2Z4XEQqoNLwZyA-FA2AAYYBqB\" , \"token_type\" : \"Bearer\" , \"refresh_token\" : \"1//0dDu3r6KVakgYIARAAGA0NwF-L9IrWIuG7_f44x-uLR2vvBocf4q-KnQVhlkm18TO2Fn0GjJp-cArWfj5kY84\" , \"expiry\" : \"2021-02-25T17:28:18.629507046-05 :00\" } -------------------- y ) Yes this is OK ( default ) e ) Edit this remote d ) Delete this remote y/e/d> y Current remotes: Name Type ==== ==== remote drive e ) Edit existing remote n ) New remote d ) Delete remote r ) Rename remote c ) Copy remote s ) Set configuration password q ) Quit config e/n/d/r/c/s/q> q Box The example below is a screen dump when setting up rclone for Yale Box. [ pl543@c14n07 ~ ] $ rclone config No remotes found - make a new one n ) New remote s ) Set configuration password q ) Quit config n/s/q> n name> remote Type of storage to configure. Enter a string value. Press Enter for the default ( \"\" ) . Choose a number from below, or type in your own value 1 / 1Fichier \\ \"fichier\" [ ... ] 6 / Box \\ \"box\" [ ... ] Storage> box ** See help for box backend at: https://rclone.org/box/ ** Box App Client Id. Leave blank normally. Enter a string value. Press Enter for the default ( \"\" ) . client_id> Box App Client Secret Leave blank normally. Enter a string value. Press Enter for the default ( \"\" ) . client_secret> Edit advanced config? ( y/n ) y ) Yes n ) No y/n> n Remote config Use auto config? * Say Y if not sure * Say N if you are working on a remote or headless machine y ) Yes n ) No y/n> y If your browser does not open automatically go to the following link: http://127.0.0.1:53682/auth Log in and authorize rclone for access Waiting for code... Got code -------------------- [ remote ] type = box token = { \"access_token\" : \"PjIXHUZ34VQSmeUZ9r6bhc9ux44KMU0e\" , \"token_type\" : \"bearer\" , \"refresh_token\" : \"VumWPWP5Nd0M2C1GyfgfJL51zUeWPPVLc6VC6lBQduEPsQ9a6ibSor2dvHmyZ6B8\" , \"expiry\" : \"2019-10-21T11:00:36.839586736-04:00\" } -------------------- y ) Yes this is OK e ) Edit this remote d ) Delete this remote y/e/d> y Current remotes: Name Type ==== ==== remote box e ) Edit existing remote n ) New remote d ) Delete remote r ) Rename remote c ) Copy remote s ) Set configuration password q ) Quit config e/n/d/r/c/s/q> q S3 The example below is a screen dump when setting up rclone for an S3 provider such as aws. [ rdb9@login1.mccleary ~ ] $ rclone config Enter configuration password: password: Current remotes: Name Type ==== ==== [ ... ] e ) Edit existing remote n ) New remote d ) Delete remote r ) Rename remote c ) Copy remote s ) Set configuration password q ) Quit config e/n/d/r/c/s/q> n ``` bash Enter name for new remote. name> remote Option Storage. Type of storage to configure. Choose a number from below, or type in your own value. [ ... ] 5 / Amazon S3 Compliant Storage Providers including AWS, Alibaba, Ceph, China Mobile, Cloudflare, ArvanCloud, DigitalOcean, Dreamhost, Huawei OBS, IBM COS, IDrive e2, IONOS Cloud, Liara, Lyve Cloud, Minio, Netease, RackCorp, Scaleway, SeaweedFS, StackPath, Storj, Tencent COS, Qiniu and Wasabi \\ ( s3 ) [ ... ] Storage> 5 Option provider. Choose your S3 provider. Choose a number from below, or type in your own value. Press Enter to leave empty. 1 / Amazon Web Services ( AWS ) S3 \\ ( AWS ) [ ... ] provider> 1 Option env_auth. Get AWS credentials from runtime ( environment variables or EC2/ECS meta data if no env vars ) . Only applies if access_key_id and secret_access_key is blank. Choose a number from below, or type in your own boolean value ( true or false ) . Press Enter for the default ( false ) . 1 / Enter AWS credentials in the next step. \\ ( false ) 2 / Get AWS credentials from the environment ( env vars or IAM ) . \\ ( true ) env_auth> Option access_key_id. AWS Access Key ID. Leave blank for anonymous access or runtime credentials. Enter a value. Press Enter to leave empty. access_key_id> *************** Option secret_access_key. AWS Secret Access Key ( password ) . Leave blank for anonymous access or runtime credentials. Enter a value. Press Enter to leave empty. secret_access_key> ************* Option region. Region to connect to. Choose a number from below, or type in your own value. Press Enter to leave empty. / The default endpoint - a good choice if you are unsure. 1 | US Region, Northern Virginia, or Pacific Northwest. | Leave location constraint empty. \\ ( us-east-1 ) / US East ( Ohio ) Region. [ ... ] [ take defaults for all remaining questions Edit advanced config? y ) Yes n ) No ( default ) y/n> n Configuration complete. Options: - type: s3 - provider: AWS - access_key_id: *************** - secret_access_key: **************** - region: us-east-1 Tip if you want to use rclone for a shared google drive, you should answer 'y' when it asks whether you want to configure it as a Shared Drive. Configure this as a Shared Drive ( Team Drive ) ? y ) Yes n ) No ( default ) y/n> y Tip rclone config creates a file storing cloud storage configurations for rclone. You can check the file name with rclone config file . The config file can be copied to other clusters so that you can use rclone on the other clusters without running rclone config again.","title":"Examples"},{"location":"clusters-at-yale/guides/rclone/#use-rclone-on-yale-clusters","text":"As we have used remote as the name of the cloud storage in our examples above, we will continue using it in the following examples. You should replace it with the actual name you have picked up for the cloud storage in your configuration. Tip If you forgot the name of the cloud storage you have configured, run rclone config show and the name will be shown in [] . $ rclone config show [ remote ] type = drive scope = drive token = { \"access_token\" : \"mytoken\" , \"expiry\" : \"2021-07-09T22:13:56.452750648-04:00\" } root_folder_id = myid","title":"Use Rclone on Yale clusters"},{"location":"clusters-at-yale/guides/rclone/#list-files","text":"rclone ls remote:/","title":"List files"},{"location":"clusters-at-yale/guides/rclone/#copy-files","text":"# to download a file to the cluster rclone copy remote:/path/to/filename . # to upload a file from the cluster to the cloud storage rclone copy filename remote:/path/to/","title":"Copy files"},{"location":"clusters-at-yale/guides/rclone/#help","text":"rclone help","title":"Help"},{"location":"clusters-at-yale/guides/tmux/","text":"tmux tmux is a \"terminal multiplexer\", it enables a number of terminals (or windows) to be accessed and controlled from a single terminal. tmux is a great way to save an interactive session between connections you make to the clusters. You can reconnect to the session from a workstation in your lab or from your laptop from home! Get Started To begin a tmux session named myproject, type tmux new -s myproject You should see a bar across the bottom of your terminal window now that gives you some information about your session. If you are disconnected or detached from this session, anything you were doing will still be there waiting when you reattach The most important shortcut to remember is Ctrl + b (hold the ctrl or control key, then type \"b\"). This is how you signal to tmux that the following keystroke is meant for it and not the session you are working in. For example: if you want to gracefully detach from your session, you can type Ctrl + b , then d for detach. To reattach to our sample tmux session after detatching, type: tmux attach -t myproject #If you are lazy and have only one session running, #This works too: tmux a Lines starting with a \"#\" denote a commented line, which aren't read as code Finally, to exit, you can type exit or Ctrl + d tmux on the Clusters Using tmux on the cluster allows you to create interactive allocations that you can detach from. Normally, if you get an interactive allocation (e.g. salloc ) then disconnect from the cluster, for example by putting your laptop to sleep, your allocation will be terminated and your job killed. Using tmux, you can detach gracefully and tmux will maintain your allocation. Here is how to do this correctly: ssh to your cluster of choice Start tmux Inside your tmux session, submit an interactive job with salloc . See the Slurm documentation for more details Inside your job allocation (on a compute node), start your application (e.g. matlab) Detach from tmux by typing Ctrl + b then d Later, on the same login node, reattach by running tmux attach Make sure to: run tmux on the login node, NOT on compute nodes run salloc inside tmux, not the reverse. Warning Every cluster has two login nodes. If you cannot find your tmux session, it might be running on the other node. Check the hostname of your current login node (from either your command prompt or from running hostname -s ), then use ssh to login to the other one. For example, if you are logged in to grace1, use ssh -Y grace2 to reach the other login node. Windows and Panes tmux allows you to create, toggle between and manipulate panes and windows in your session. A window is the whole screen that tmux displays to you. Panes are subdivisions in the curent window, where each runs an independent terminal. Especially at first, you probably won't need more than one pane at a time. Multiple windows can be created and run off-screen. Here is an example where this may be useful. Say you just submitted an interactive job that is running on a compute node inside your tmux session. [ ms725@grace1 ~ ] $ tmux new -s analysis # I am in my tmux session now [ ms725@grace1 ~ ] $ salloc [ ms725@c14n02 ~ ] $ ./my_fancy_analysis.sh Now you can easily monitor its CPU and memory utilization without ever taking your eyes off of it by creating a new pane and running top there. Split your window by typing: Ctrl + b then % ssh into the compute node you are working on, then run top to watch your work as it runs all from the same window. # I'm in a new pane now. [ ms725@grace1 ~ ] $ ssh c14n02 [ ms725@c14n02 ~ ] $ top Your view will look something like this: To switch back and forth between panes, type Ctrl + b then o","title":"tmux"},{"location":"clusters-at-yale/guides/tmux/#tmux","text":"tmux is a \"terminal multiplexer\", it enables a number of terminals (or windows) to be accessed and controlled from a single terminal. tmux is a great way to save an interactive session between connections you make to the clusters. You can reconnect to the session from a workstation in your lab or from your laptop from home!","title":"tmux"},{"location":"clusters-at-yale/guides/tmux/#get-started","text":"To begin a tmux session named myproject, type tmux new -s myproject You should see a bar across the bottom of your terminal window now that gives you some information about your session. If you are disconnected or detached from this session, anything you were doing will still be there waiting when you reattach The most important shortcut to remember is Ctrl + b (hold the ctrl or control key, then type \"b\"). This is how you signal to tmux that the following keystroke is meant for it and not the session you are working in. For example: if you want to gracefully detach from your session, you can type Ctrl + b , then d for detach. To reattach to our sample tmux session after detatching, type: tmux attach -t myproject #If you are lazy and have only one session running, #This works too: tmux a Lines starting with a \"#\" denote a commented line, which aren't read as code Finally, to exit, you can type exit or Ctrl + d","title":"Get Started"},{"location":"clusters-at-yale/guides/tmux/#tmux-on-the-clusters","text":"Using tmux on the cluster allows you to create interactive allocations that you can detach from. Normally, if you get an interactive allocation (e.g. salloc ) then disconnect from the cluster, for example by putting your laptop to sleep, your allocation will be terminated and your job killed. Using tmux, you can detach gracefully and tmux will maintain your allocation. Here is how to do this correctly: ssh to your cluster of choice Start tmux Inside your tmux session, submit an interactive job with salloc . See the Slurm documentation for more details Inside your job allocation (on a compute node), start your application (e.g. matlab) Detach from tmux by typing Ctrl + b then d Later, on the same login node, reattach by running tmux attach Make sure to: run tmux on the login node, NOT on compute nodes run salloc inside tmux, not the reverse. Warning Every cluster has two login nodes. If you cannot find your tmux session, it might be running on the other node. Check the hostname of your current login node (from either your command prompt or from running hostname -s ), then use ssh to login to the other one. For example, if you are logged in to grace1, use ssh -Y grace2 to reach the other login node.","title":"tmux on the Clusters"},{"location":"clusters-at-yale/guides/tmux/#windows-and-panes","text":"tmux allows you to create, toggle between and manipulate panes and windows in your session. A window is the whole screen that tmux displays to you. Panes are subdivisions in the curent window, where each runs an independent terminal. Especially at first, you probably won't need more than one pane at a time. Multiple windows can be created and run off-screen. Here is an example where this may be useful. Say you just submitted an interactive job that is running on a compute node inside your tmux session. [ ms725@grace1 ~ ] $ tmux new -s analysis # I am in my tmux session now [ ms725@grace1 ~ ] $ salloc [ ms725@c14n02 ~ ] $ ./my_fancy_analysis.sh Now you can easily monitor its CPU and memory utilization without ever taking your eyes off of it by creating a new pane and running top there. Split your window by typing: Ctrl + b then % ssh into the compute node you are working on, then run top to watch your work as it runs all from the same window. # I'm in a new pane now. [ ms725@grace1 ~ ] $ ssh c14n02 [ ms725@c14n02 ~ ] $ top Your view will look something like this: To switch back and forth between panes, type Ctrl + b then o","title":"Windows and Panes"},{"location":"clusters-at-yale/guides/vasp/","text":"VASP Note VASP requires a paid license. If you wish to use VASP on the cluster and your research group has purchased a license, please contact us to gain access to the cluster installation. Thank you for your cooperation. VASP and Slurm In Slurm, there is big difference between --ntasks and --cpus-per-task which is explained in our Requesting Resources documentation . For the purposes of VASP, --ntasks-per-node should always equal NCORE (in your INCAR file). Then --nodes should be equal to the total number of cores you want, divided by --ntasks-per-node . VASP has two parameters for controlling processor layouts, NCORE and NPAR , but you only need to set one of them. If you set NCORE , you don\u2019t need to set NPAR . Instead VASP will automatically set NPAR . In your mpirun line, you should specify the number of MPI tasks as: mpirun -n $SLURM_NTASKS vasp_std Cores Layout Examples If you want 40 cores (2 nodes and 20 cpus per node): in your submission script: #SBATCH --nodes=2 #SBATCH --ntasks-per-node=20 mpirun -n 2 vasp_std in INCAR : NCORE=20 You may however find that the wait time to get 20 cores on two nodes can be very long since cores request via --cpus-per-task can\u2019t span multiple nodes. Instead you might want to try breaking it up into smaller chunks. Therefore, try: in your submission script: #SBATCH --nodes=4 #SBATCH --ntasks-per-node=10 mpirun -n 4 vasp_std in INCAR : NCORE=10 which would likely spread over 4 nodes using 10 cores each and spend less time in the queue. Grace mpi partition On Grace's mpi parttion, since cores are assigned as whole 24-core nodes, NCORE should always be equal to 24 and then you can just request ntasks in multiples of 24. in your submission script: #SBATCH --ntasks=48 # some multiple of 24 mpirun -n $SLURM_NTASKS vasp_std in INCAR : NCORE=24 Additional Performance Some users have found that if they actually assign 2 MPI tasks per node (rather than 1), they see even better performance because the MPI tasks doesn't span the two sockets on the node. To try this, set NCORE to half of your nodes' core count and increase mpirun -n to twice the number of nodes you requested. Additional Reading Here is some documentation on how to optimally configure NCORE and NPAR: https://www.vasp.at/wiki/index.php/NCORE https://www.vasp.at/wiki/index.php/NPAR https://www.nsc.liu.se/~pla/blog/2015/01/12/vasp-how-many-cores/","title":"VASP"},{"location":"clusters-at-yale/guides/vasp/#vasp","text":"Note VASP requires a paid license. If you wish to use VASP on the cluster and your research group has purchased a license, please contact us to gain access to the cluster installation. Thank you for your cooperation.","title":"VASP"},{"location":"clusters-at-yale/guides/vasp/#vasp-and-slurm","text":"In Slurm, there is big difference between --ntasks and --cpus-per-task which is explained in our Requesting Resources documentation . For the purposes of VASP, --ntasks-per-node should always equal NCORE (in your INCAR file). Then --nodes should be equal to the total number of cores you want, divided by --ntasks-per-node . VASP has two parameters for controlling processor layouts, NCORE and NPAR , but you only need to set one of them. If you set NCORE , you don\u2019t need to set NPAR . Instead VASP will automatically set NPAR . In your mpirun line, you should specify the number of MPI tasks as: mpirun -n $SLURM_NTASKS vasp_std","title":"VASP and Slurm"},{"location":"clusters-at-yale/guides/vasp/#cores-layout-examples","text":"If you want 40 cores (2 nodes and 20 cpus per node): in your submission script: #SBATCH --nodes=2 #SBATCH --ntasks-per-node=20 mpirun -n 2 vasp_std in INCAR : NCORE=20 You may however find that the wait time to get 20 cores on two nodes can be very long since cores request via --cpus-per-task can\u2019t span multiple nodes. Instead you might want to try breaking it up into smaller chunks. Therefore, try: in your submission script: #SBATCH --nodes=4 #SBATCH --ntasks-per-node=10 mpirun -n 4 vasp_std in INCAR : NCORE=10 which would likely spread over 4 nodes using 10 cores each and spend less time in the queue.","title":"Cores Layout Examples"},{"location":"clusters-at-yale/guides/vasp/#grace-mpi-partition","text":"On Grace's mpi parttion, since cores are assigned as whole 24-core nodes, NCORE should always be equal to 24 and then you can just request ntasks in multiples of 24. in your submission script: #SBATCH --ntasks=48 # some multiple of 24 mpirun -n $SLURM_NTASKS vasp_std in INCAR : NCORE=24","title":"Grace mpi partition"},{"location":"clusters-at-yale/guides/vasp/#additional-performance","text":"Some users have found that if they actually assign 2 MPI tasks per node (rather than 1), they see even better performance because the MPI tasks doesn't span the two sockets on the node. To try this, set NCORE to half of your nodes' core count and increase mpirun -n to twice the number of nodes you requested.","title":"Additional Performance"},{"location":"clusters-at-yale/guides/vasp/#additional-reading","text":"Here is some documentation on how to optimally configure NCORE and NPAR: https://www.vasp.at/wiki/index.php/NCORE https://www.vasp.at/wiki/index.php/NPAR https://www.nsc.liu.se/~pla/blog/2015/01/12/vasp-how-many-cores/","title":"Additional Reading"},{"location":"clusters-at-yale/guides/virtualgl/","text":"VirtualGL Why VirtualGL To display a 3D application running remotely on a cluster, you could use X11 forwarding to display the application on your local machine. This is usually very slow and often unusable. An alternative approach is to use VNC - also called Remote Desktop - to run GUI applications remotely on the cluster. This approach only works well with applications that only need moderate 3D rendering where software rendering is good enough. For applications that need to render large complicated models, hardware accelerated 3D rendering must be used. However, VNC cannot directly utilize the graphic devices on the cluster for rendering. VirtualGL , in conjunction with VNC, provides a commonly used solution for remote 3D rendering with hardware acceleration. How to use VirtualGL VirtualGL 3.0+ supports the traditional GLX back end and the new EGL back end for 3D rendering. The EGL back end uses a DRI (Direct Rendering Infrastructure) device to access a graphics device, while the GLX back end uses an X server to access a graphics device. The EGL back end allows simultaneous jobs on the same node, each using their own dedicated GPU device for rendering. Although it can render many applications properly, the EGL back end may fail to render some applications. The GLX back end supports a wider range of OpenGL applications than the EGL back end, however, only one X server can work properly with the graphics devices on the node. This means only one job can use the GLX back end on any GPU node, no matter how many GPU devices the node has. We suggest you use the EGL back end first. If it does not render your application properly, then switch to the GLX back end. We have provided a wrapper script ycrc_vglrun to make it easy for you to choose which back end to use for 3D rendering. In the following examples, we will use ParaView (unless mentioned otherwise) to demonstrate how to use ycrc_vglrun . Note If you need to run a hardware accelerated GUI application, you should first start a Remote Desktop on a GPU node, and then run the application from the shell in the Remote Desktop as shown below. We have not incorporated VirtualGL into the standalone interactive Apps on OOD that could benefit from VirtualGL. However, this could change in the future. Use VirtualGL with the EGL back end EGL is the default back end which ycrc_vglrun will choose to use if no option is provided. You can also add the -e option to choose the EGL back end explicitly. module load ParaView ycrc_vglrun paraview module load ParaView ycrc_vglrun -e paraview Use VirtualGL with the GLX back end If your application cannot be rendered properly with the EGL back end, your next step is to try the GLX back end. You should choose it explicitly with the -g option. module load ParaView ycrc_vglrun -g paraview Run MATLAB with hardware OpenGL rendering By default, MATLAB will use software OpenGL rendering. To run MATLAB with hardware OpenGL rendering, add -nosoftwareopengl . module load MATLAB ycrc_vglrun matlab -nosoftwareopengl Troubleshoot nvidia-smi or vglrun cannot be found You must submit your job to a GPU node. If you are using the Remote Desktop from OOD, make sure you have specified gpu as 1 and partition as gpu or any other partition with GPU nodes. GLX back end is used by another application If you get the following message when running your application with the GLX back end, you need to add --exclude=nodename to Advanced options in the Remote Desktop OOD user interface and resubmit Remote Desktop. Replace nodename with the actual node name from the message. VirtualGL with the GLX back end is currently used by another application. Please resubmit your job with --exclude = c22n01","title":"VirtualGL"},{"location":"clusters-at-yale/guides/virtualgl/#virtualgl","text":"","title":"VirtualGL"},{"location":"clusters-at-yale/guides/virtualgl/#why-virtualgl","text":"To display a 3D application running remotely on a cluster, you could use X11 forwarding to display the application on your local machine. This is usually very slow and often unusable. An alternative approach is to use VNC - also called Remote Desktop - to run GUI applications remotely on the cluster. This approach only works well with applications that only need moderate 3D rendering where software rendering is good enough. For applications that need to render large complicated models, hardware accelerated 3D rendering must be used. However, VNC cannot directly utilize the graphic devices on the cluster for rendering. VirtualGL , in conjunction with VNC, provides a commonly used solution for remote 3D rendering with hardware acceleration.","title":"Why VirtualGL"},{"location":"clusters-at-yale/guides/virtualgl/#how-to-use-virtualgl","text":"VirtualGL 3.0+ supports the traditional GLX back end and the new EGL back end for 3D rendering. The EGL back end uses a DRI (Direct Rendering Infrastructure) device to access a graphics device, while the GLX back end uses an X server to access a graphics device. The EGL back end allows simultaneous jobs on the same node, each using their own dedicated GPU device for rendering. Although it can render many applications properly, the EGL back end may fail to render some applications. The GLX back end supports a wider range of OpenGL applications than the EGL back end, however, only one X server can work properly with the graphics devices on the node. This means only one job can use the GLX back end on any GPU node, no matter how many GPU devices the node has. We suggest you use the EGL back end first. If it does not render your application properly, then switch to the GLX back end. We have provided a wrapper script ycrc_vglrun to make it easy for you to choose which back end to use for 3D rendering. In the following examples, we will use ParaView (unless mentioned otherwise) to demonstrate how to use ycrc_vglrun . Note If you need to run a hardware accelerated GUI application, you should first start a Remote Desktop on a GPU node, and then run the application from the shell in the Remote Desktop as shown below. We have not incorporated VirtualGL into the standalone interactive Apps on OOD that could benefit from VirtualGL. However, this could change in the future.","title":"How to use VirtualGL"},{"location":"clusters-at-yale/guides/virtualgl/#use-virtualgl-with-the-egl-back-end","text":"EGL is the default back end which ycrc_vglrun will choose to use if no option is provided. You can also add the -e option to choose the EGL back end explicitly. module load ParaView ycrc_vglrun paraview module load ParaView ycrc_vglrun -e paraview","title":"Use VirtualGL with the EGL back end"},{"location":"clusters-at-yale/guides/virtualgl/#use-virtualgl-with-the-glx-back-end","text":"If your application cannot be rendered properly with the EGL back end, your next step is to try the GLX back end. You should choose it explicitly with the -g option. module load ParaView ycrc_vglrun -g paraview","title":"Use VirtualGL with the GLX back end"},{"location":"clusters-at-yale/guides/virtualgl/#run-matlab-with-hardware-opengl-rendering","text":"By default, MATLAB will use software OpenGL rendering. To run MATLAB with hardware OpenGL rendering, add -nosoftwareopengl . module load MATLAB ycrc_vglrun matlab -nosoftwareopengl","title":"Run MATLAB with hardware OpenGL rendering"},{"location":"clusters-at-yale/guides/virtualgl/#troubleshoot","text":"","title":"Troubleshoot"},{"location":"clusters-at-yale/guides/virtualgl/#nvidia-smi-or-vglrun-cannot-be-found","text":"You must submit your job to a GPU node. If you are using the Remote Desktop from OOD, make sure you have specified gpu as 1 and partition as gpu or any other partition with GPU nodes.","title":"nvidia-smi or vglrun cannot be found"},{"location":"clusters-at-yale/guides/virtualgl/#glx-back-end-is-used-by-another-application","text":"If you get the following message when running your application with the GLX back end, you need to add --exclude=nodename to Advanced options in the Remote Desktop OOD user interface and resubmit Remote Desktop. Replace nodename with the actual node name from the message. VirtualGL with the GLX back end is currently used by another application. Please resubmit your job with --exclude = c22n01","title":"GLX back end is used by another application"},{"location":"clusters-at-yale/guides/xvfb/","text":"Virtual Frame Buffer for Batch Mode Often there is a need to run a program with a graphical interface in batch mode. This can be either due to extended run-time or the desire to run many instances of the process at once. In either case the lack of a display can prevent the program from running. A solution has been developed to create a virtual display that only lives in memory. This allows the program to happily launch its graphical interface while in batch mode. Note It is common for R to require a display session to save certain types of figures. You may see a warning like \"unable to start device PNG\" or \"unable to open connection to X11 display\". xvfb can help avoid these issues. This tool is called the X Virtual Frame Buffer or xvfb . It can act as a wrapper to your script which creates a virtual display session. For example, to run an R script (e.g. make_jpeg.R ) which needs a display session in order to save a JPEG file: xvfb-run Rscript make_jpeg.R For more details and other examples see the xvfb-run man page by running man xvfb-run on any compute node.","title":"XVFB"},{"location":"clusters-at-yale/guides/xvfb/#virtual-frame-buffer-for-batch-mode","text":"Often there is a need to run a program with a graphical interface in batch mode. This can be either due to extended run-time or the desire to run many instances of the process at once. In either case the lack of a display can prevent the program from running. A solution has been developed to create a virtual display that only lives in memory. This allows the program to happily launch its graphical interface while in batch mode. Note It is common for R to require a display session to save certain types of figures. You may see a warning like \"unable to start device PNG\" or \"unable to open connection to X11 display\". xvfb can help avoid these issues. This tool is called the X Virtual Frame Buffer or xvfb . It can act as a wrapper to your script which creates a virtual display session. For example, to run an R script (e.g. make_jpeg.R ) which needs a display session in order to save a JPEG file: xvfb-run Rscript make_jpeg.R For more details and other examples see the xvfb-run man page by running man xvfb-run on any compute node.","title":"Virtual Frame Buffer for Batch Mode"},{"location":"clusters-at-yale/job-scheduling/","text":"Run Jobs with Slurm Performing computational work at scale in a shared environment involves organizing everyone's work into jobs and scheduling them. We use Slurm to schedule and manage jobs on the YCRC clusters . Submitting a job involves specifying a resource request then running one or more commands or applications. These requests take the form of options to the command-line programs salloc and sbatch or those same options as directives inside submission scripts. Requests are made of groups of compute nodes (servers) called partitions. Partitions, their defaults, limits, and purposes are listed on each cluster page . Once submitted, jobs wait in a queue and are subject to several factors affecting scheduling priority . When your scheduled job begins, the commands or applications you specify are run on compute nodes the scheduler found to satisfy your resource request. If the job was submitted as a batch job, output normally printed to the screen will be saved to file. Please be a good cluster citizen. Do not run heavy computation on login nodes (e.g. grace1 , login1.mccleary ). Doing so negatively impacts everyone's ability to interact with the cluster. Make resource requests for your jobs that reflect what they will use. Wasteful job allocations slow down everyone's work on the clusters. See our documentation on Monitoring CPU and Memory Usage for how to measure job resource usage. If you plan to run many similar jobs, use our Dead Simple Queue tool or job arrays - we enforce limits on job submission rates on all clusters. If you find yourself wondering how best to schedule a job, please contact us for some help. Common Slurm Commands For an exhaustive list of commands and their official manuals, see the SchedMD Man Pages . Below are some of the most common commands used to interact with the scheduler. Submit a script called my_job.sh as a job ( see below for details): sbatch my_job.sh List your queued and running jobs: squeue --me Cancel a queued job or kill a running job, e.g. a job with ID 12345: scancel 12345 Check status of a job, e.g. a job with ID 12345: sacct -j 12345 Check how efficiently a job ran, e.g. a job with ID 12345: seff 12345 See our Monitor CPU and Memory page for more on tracking the resources your job actually uses. Common Job Request Options These options modify the size, length and behavior of jobs you submit. They can be specified when calling salloc or sbatch , or saved to a batch script . Options specified on the command line to sbatch will override those in a batch script. See our Request Compute Resources page for discussion on the differences between --ntasks and --cpus-per-task , constraints, GPUs, etc. If options are left unspecified defaults are used. Long Option Short Option Default Description --job-name -J Name of script Custom job name. --output -o \"slurm-%j.out\" Where to save stdout and stderr from the job. See filename patterns for more formatting options. --partition -p Varies by cluster Partition to run on. See individual cluster pages for details. --account -A Your group name Specify if you have access to multiple private partitions. --time -t Varies by partition Time limit for the job in D-HH:MM:SS, e.g. -t 1- is one day, -t 4:00:00 is 4 hours. --nodes -N 1 Total number of nodes. --ntasks -n 1 Number of tasks (MPI workers). --ntasks-per-node Scheduler decides Number of tasks per node. --cpus-per-task -c 1 Number of CPUs for each task. Use this for threads/cores in single-node jobs. --mem-per-cpu 5G Memory requested per CPU in MiB. Add G to specify GiB (e.g. 10G ). --mem Memory requested per node in MiB. Add G to specify GiB (e.g. 10G ). --gpus -G Used to request GPUs --constraint -C Constraints on node features. To limit kinds of nodes to run on. --mail-user Your Yale email Mail address (alternatively, put your email address in ~/.forward). --mail-type None Send email when jobs change state. Use ALL to receive email notifications at the beginning and end of the job. Interactive Jobs Interactive jobs can be used for testing and troubleshooting code. Requesting an interactive job will allocate resources and log you into a shell on a compute node. You can start an interactive job using the salloc command. Unless specified otherwise using the -p flag (see above), all salloc requests will go to the devel ( interactive on Milgram and Ruddle) partition on the cluster. For example, to request an interactive job with 8GB of RAM for 2 hours: salloc -t 2 :00:00 --mem = 8G This will assign one CPU and 8GiB of RAM to you for two hours. You can run commands in this shell as needed. To exit, you can type exit or Ctrl + d Use tmux with Interactive Sessions Remote sessions are vulnerable to being killed if you lose your network connection. We recommend using tmux alleviate this. When using tmux with interactive jobs, please take extra care to stop jobs that are no longer needed. Graphical applications Many graphical applications are well served with the Open OnDemand Remote Desktop app . If you would like to use X11 forwarding, first make sure it is installed and configured . Then, add the --x11 flag to an interactive job request: salloc --x11 Batch Jobs You can submit a script as a batch job, i.e. one that can be run non-interactively in batches. These submission scripts are comprised of three parts: A hashbang line specifying the program that runs the script. This is normally #!/bin/bash . Directives that list job request options. These lines must appear before any other commands or definitions, otherwise they will be ignored. The commands or applications you want executed during your job. See our page of Submission Script Examples for a few more, or the example scripts repo for more in-depth examples. Here is an example submission script that prints some job information and exits: #!/bin/bash #SBATCH --job-name=example_job #SBATCH --time=2:00:00 #SBATCH --mail-type=ALL module purge module load MATLAB/2021a matlab -batch \"your_script\" Save this file as example_job.sh , then submit it with: sbatch example_job.sh When the job finishes the output should be stored in a file called slurm-jobid.out , where jobid is the submitted job's ID. If you find yourself writing loops to submit jobs, instead use our Dead Simple Queue tool or job arrays .","title":"Run Jobs with Slurm"},{"location":"clusters-at-yale/job-scheduling/#run-jobs-with-slurm","text":"Performing computational work at scale in a shared environment involves organizing everyone's work into jobs and scheduling them. We use Slurm to schedule and manage jobs on the YCRC clusters . Submitting a job involves specifying a resource request then running one or more commands or applications. These requests take the form of options to the command-line programs salloc and sbatch or those same options as directives inside submission scripts. Requests are made of groups of compute nodes (servers) called partitions. Partitions, their defaults, limits, and purposes are listed on each cluster page . Once submitted, jobs wait in a queue and are subject to several factors affecting scheduling priority . When your scheduled job begins, the commands or applications you specify are run on compute nodes the scheduler found to satisfy your resource request. If the job was submitted as a batch job, output normally printed to the screen will be saved to file. Please be a good cluster citizen. Do not run heavy computation on login nodes (e.g. grace1 , login1.mccleary ). Doing so negatively impacts everyone's ability to interact with the cluster. Make resource requests for your jobs that reflect what they will use. Wasteful job allocations slow down everyone's work on the clusters. See our documentation on Monitoring CPU and Memory Usage for how to measure job resource usage. If you plan to run many similar jobs, use our Dead Simple Queue tool or job arrays - we enforce limits on job submission rates on all clusters. If you find yourself wondering how best to schedule a job, please contact us for some help.","title":"Run Jobs with Slurm"},{"location":"clusters-at-yale/job-scheduling/#common-slurm-commands","text":"For an exhaustive list of commands and their official manuals, see the SchedMD Man Pages . Below are some of the most common commands used to interact with the scheduler. Submit a script called my_job.sh as a job ( see below for details): sbatch my_job.sh List your queued and running jobs: squeue --me Cancel a queued job or kill a running job, e.g. a job with ID 12345: scancel 12345 Check status of a job, e.g. a job with ID 12345: sacct -j 12345 Check how efficiently a job ran, e.g. a job with ID 12345: seff 12345 See our Monitor CPU and Memory page for more on tracking the resources your job actually uses.","title":"Common Slurm Commands"},{"location":"clusters-at-yale/job-scheduling/#common-job-request-options","text":"These options modify the size, length and behavior of jobs you submit. They can be specified when calling salloc or sbatch , or saved to a batch script . Options specified on the command line to sbatch will override those in a batch script. See our Request Compute Resources page for discussion on the differences between --ntasks and --cpus-per-task , constraints, GPUs, etc. If options are left unspecified defaults are used. Long Option Short Option Default Description --job-name -J Name of script Custom job name. --output -o \"slurm-%j.out\" Where to save stdout and stderr from the job. See filename patterns for more formatting options. --partition -p Varies by cluster Partition to run on. See individual cluster pages for details. --account -A Your group name Specify if you have access to multiple private partitions. --time -t Varies by partition Time limit for the job in D-HH:MM:SS, e.g. -t 1- is one day, -t 4:00:00 is 4 hours. --nodes -N 1 Total number of nodes. --ntasks -n 1 Number of tasks (MPI workers). --ntasks-per-node Scheduler decides Number of tasks per node. --cpus-per-task -c 1 Number of CPUs for each task. Use this for threads/cores in single-node jobs. --mem-per-cpu 5G Memory requested per CPU in MiB. Add G to specify GiB (e.g. 10G ). --mem Memory requested per node in MiB. Add G to specify GiB (e.g. 10G ). --gpus -G Used to request GPUs --constraint -C Constraints on node features. To limit kinds of nodes to run on. --mail-user Your Yale email Mail address (alternatively, put your email address in ~/.forward). --mail-type None Send email when jobs change state. Use ALL to receive email notifications at the beginning and end of the job.","title":"Common Job Request Options"},{"location":"clusters-at-yale/job-scheduling/#interactive-jobs","text":"Interactive jobs can be used for testing and troubleshooting code. Requesting an interactive job will allocate resources and log you into a shell on a compute node. You can start an interactive job using the salloc command. Unless specified otherwise using the -p flag (see above), all salloc requests will go to the devel ( interactive on Milgram and Ruddle) partition on the cluster. For example, to request an interactive job with 8GB of RAM for 2 hours: salloc -t 2 :00:00 --mem = 8G This will assign one CPU and 8GiB of RAM to you for two hours. You can run commands in this shell as needed. To exit, you can type exit or Ctrl + d Use tmux with Interactive Sessions Remote sessions are vulnerable to being killed if you lose your network connection. We recommend using tmux alleviate this. When using tmux with interactive jobs, please take extra care to stop jobs that are no longer needed.","title":"Interactive Jobs"},{"location":"clusters-at-yale/job-scheduling/#graphical-applications","text":"Many graphical applications are well served with the Open OnDemand Remote Desktop app . If you would like to use X11 forwarding, first make sure it is installed and configured . Then, add the --x11 flag to an interactive job request: salloc --x11","title":"Graphical applications"},{"location":"clusters-at-yale/job-scheduling/#batch-jobs","text":"You can submit a script as a batch job, i.e. one that can be run non-interactively in batches. These submission scripts are comprised of three parts: A hashbang line specifying the program that runs the script. This is normally #!/bin/bash . Directives that list job request options. These lines must appear before any other commands or definitions, otherwise they will be ignored. The commands or applications you want executed during your job. See our page of Submission Script Examples for a few more, or the example scripts repo for more in-depth examples. Here is an example submission script that prints some job information and exits: #!/bin/bash #SBATCH --job-name=example_job #SBATCH --time=2:00:00 #SBATCH --mail-type=ALL module purge module load MATLAB/2021a matlab -batch \"your_script\" Save this file as example_job.sh , then submit it with: sbatch example_job.sh When the job finishes the output should be stored in a file called slurm-jobid.out , where jobid is the submitted job's ID. If you find yourself writing loops to submit jobs, instead use our Dead Simple Queue tool or job arrays .","title":"Batch Jobs"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/","text":"Common Job Failures Your jobs haven't failed, you have just found ways to run them that won't work. Here are some common error messages and steps to correct them. Memory Limits Jobs can fail due to an insufficient memory being requested. Depending on the job, this failure might present as a Slurm error: slurmstepd: error: Detected 1 oom-kill event(s). Some of your processes may have been killed by the cgroup out-of-memory handler. This means Slurm detected the job hitting the maximum requested memory and then the job was killed. When process inside a job tries to access memory outside what was allocated to that job (more than what you requested) the operating system tells your program that address is invalid with the fault Bus Error . A similar fault you might be more familiar with is a Segmentation Fault , which usually results from a program incorrectly trying to access a valid memory address. These errors can be fixed in two ways. Request More Memory The default is almost always --mem-per-cpu=5G In a batch script: #SBATCH --mem-per-cpu=8G In an interactive job: salloc --mem-per-cpu = 8G Use Less Memory This method is usually a little more involved, and is easier if you can inspect the code you are using. Watching your job's resource usage , attending a workshop , or getting in touch with us are good places to start. Disk Quotas Since the clusters are shared resources, we have quotas in place to enforce fair use of storage. When you or your group reach a quota, you can't write to existing files or create new ones. Any jobs that depend on creating or writing files that count toward the affected quota will fail. To inspect your current usage, run the command getquota . Remember, your home quota is yours but your project, scratch60, and any purchased storage quotas are shared across your group. Archive Files You may find that some files or direcories for previous projects are no longer needed on the cluster. We recommend you archive these to recover space. Delete Files If you are sure you no longer need some files or direcories, you can delete them. Unless files are in your home directory (not project or scratch60 ) they are not backed up and may be unrecoverable. Use the rm -rf command very carefully. Buy More Space If you would like to purchase more than the default quotas, we can help you buy space on the clusters . Rate Limits We rate-limit job submissions to 200 jobs per hour on each cluster. This limit helps even out load on the scheduler and encourages good practice. When you hit this limit, you will get an error when submitting new jobs that looks like this: sbatch: error: Reached jobs per hour limit sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits) You will then need to wait until your submission rate drops. Use Job Arrays To avoid hitting this limit and make large numbers of jobs more manageable, you should use Dead Simple Queue or job arrays . If you need help adapting your workflow to dsq or job arrays contact us . Software Modules We build and organize software modules on the cluster using toolchains . The major toolchains we use produce modules that end in foss-yearletter or intel-yearletter, e.g. foss-2018b or intel-2018a . If modules from different toolchains are loaded at the same time, the conflicts that arise often lead to errors or strange application behavior. Seeing either of the following messages is a sign that you are loading incompatible modules. The following have been reloaded with a version change: 1) FFTW/3.3.7-gompi-2018a => FFTW/3.3.8-gompi-2018b 2) GCC/6.4.0-2.28 => GCC/7.3.0-2.3.0 3) GCCcore/6.4.0 => GCCcore/7.3.0 ... or GCCcore/7.3.0 exists but could not be loaded as requested. Match or Purge Your Toolchains Where possible, only use one toolchain at a time. When you want to use software from muliple toolchains run module purge between running new module load commands. If your work requires a version of software that is not installed, contact us . Conda Environments Conda environments provide a nice way to manage python and R packages and modules. Conda acieves this by setting functions and environment variables that point to your environment files when you run conda activate . Unlike modules , conda environments are not completely forwarded into a job; having a conda environment loaded when you submit a job doesn't forward it well into your job. You will likely see messages about missing packages and libraries you definitely installed into the environment you want to use in your job. Load Conda Environments Right Before Use To make sure that your environment is set up properly for interactive use, wait until you are on the host you plan to use your environment on. Then run conda activate my_env . To make sure batch jobs function properly, only submit jobs without an environment loaded ( conda deactivate before sbatch ). Make sure you load miniconda and your environment in the body of your batch submission script.","title":"Common Job Failures"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#common-job-failures","text":"Your jobs haven't failed, you have just found ways to run them that won't work. Here are some common error messages and steps to correct them.","title":"Common Job Failures"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#memory-limits","text":"Jobs can fail due to an insufficient memory being requested. Depending on the job, this failure might present as a Slurm error: slurmstepd: error: Detected 1 oom-kill event(s). Some of your processes may have been killed by the cgroup out-of-memory handler. This means Slurm detected the job hitting the maximum requested memory and then the job was killed. When process inside a job tries to access memory outside what was allocated to that job (more than what you requested) the operating system tells your program that address is invalid with the fault Bus Error . A similar fault you might be more familiar with is a Segmentation Fault , which usually results from a program incorrectly trying to access a valid memory address. These errors can be fixed in two ways.","title":"Memory Limits"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#request-more-memory","text":"The default is almost always --mem-per-cpu=5G In a batch script: #SBATCH --mem-per-cpu=8G In an interactive job: salloc --mem-per-cpu = 8G","title":"Request More Memory"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#use-less-memory","text":"This method is usually a little more involved, and is easier if you can inspect the code you are using. Watching your job's resource usage , attending a workshop , or getting in touch with us are good places to start.","title":"Use Less Memory"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#disk-quotas","text":"Since the clusters are shared resources, we have quotas in place to enforce fair use of storage. When you or your group reach a quota, you can't write to existing files or create new ones. Any jobs that depend on creating or writing files that count toward the affected quota will fail. To inspect your current usage, run the command getquota . Remember, your home quota is yours but your project, scratch60, and any purchased storage quotas are shared across your group.","title":"Disk Quotas"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#archive-files","text":"You may find that some files or direcories for previous projects are no longer needed on the cluster. We recommend you archive these to recover space.","title":"Archive Files"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#delete-files","text":"If you are sure you no longer need some files or direcories, you can delete them. Unless files are in your home directory (not project or scratch60 ) they are not backed up and may be unrecoverable. Use the rm -rf command very carefully.","title":"Delete Files"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#buy-more-space","text":"If you would like to purchase more than the default quotas, we can help you buy space on the clusters .","title":"Buy More Space"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#rate-limits","text":"We rate-limit job submissions to 200 jobs per hour on each cluster. This limit helps even out load on the scheduler and encourages good practice. When you hit this limit, you will get an error when submitting new jobs that looks like this: sbatch: error: Reached jobs per hour limit sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits) You will then need to wait until your submission rate drops.","title":"Rate Limits"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#use-job-arrays","text":"To avoid hitting this limit and make large numbers of jobs more manageable, you should use Dead Simple Queue or job arrays . If you need help adapting your workflow to dsq or job arrays contact us .","title":"Use Job Arrays"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#software-modules","text":"We build and organize software modules on the cluster using toolchains . The major toolchains we use produce modules that end in foss-yearletter or intel-yearletter, e.g. foss-2018b or intel-2018a . If modules from different toolchains are loaded at the same time, the conflicts that arise often lead to errors or strange application behavior. Seeing either of the following messages is a sign that you are loading incompatible modules. The following have been reloaded with a version change: 1) FFTW/3.3.7-gompi-2018a => FFTW/3.3.8-gompi-2018b 2) GCC/6.4.0-2.28 => GCC/7.3.0-2.3.0 3) GCCcore/6.4.0 => GCCcore/7.3.0 ... or GCCcore/7.3.0 exists but could not be loaded as requested.","title":"Software Modules"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#match-or-purge-your-toolchains","text":"Where possible, only use one toolchain at a time. When you want to use software from muliple toolchains run module purge between running new module load commands. If your work requires a version of software that is not installed, contact us .","title":"Match or Purge Your Toolchains"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#conda-environments","text":"Conda environments provide a nice way to manage python and R packages and modules. Conda acieves this by setting functions and environment variables that point to your environment files when you run conda activate . Unlike modules , conda environments are not completely forwarded into a job; having a conda environment loaded when you submit a job doesn't forward it well into your job. You will likely see messages about missing packages and libraries you definitely installed into the environment you want to use in your job.","title":"Conda Environments"},{"location":"clusters-at-yale/job-scheduling/common-job-failures/#load-conda-environments-right-before-use","text":"To make sure that your environment is set up properly for interactive use, wait until you are on the host you plan to use your environment on. Then run conda activate my_env . To make sure batch jobs function properly, only submit jobs without an environment loaded ( conda deactivate before sbatch ). Make sure you load miniconda and your environment in the body of your batch submission script.","title":"Load Conda Environments Right Before Use"},{"location":"clusters-at-yale/job-scheduling/dependency/","text":"Jobs with Dependencies SLURM offers a tool which can help string jobs together via dependencies. When submitting a job, you can specify that it should wait to run until a specified job has finished. This provides a mechanism to create simple pipelines for managing complicated workflows. Simple Pipeline As a toy example, consider a two-step pipeline, first a data transfer followed by an analysis step. Here we will use the --dependency flag for sbatch and the afterok type that requires a job to finish successfully before starting the second step: The first step is controlled by a sbatch submission script called step1.sh : #!/bin/bash #SBATCH --job-name=DataTransfer #SBATCH -t 30:00 rsync -avP remote_host:/path/to/data.csv $HOME /project/ The second step is controlled by step2.sh : #!/bin/bash #SBATCH --job-name=DataProcess #SBATCH -t 5:00:00 module load miniconda source activate my_env python my_script.py $HOME /project/data.csv When we submit the first step (using the command sbatch step1.sh ) we obtain the jobid number for that job. We then submit the second step adding in the --dependency flag to tell Slurm that this job requires the first job to finish before it can start: sbatch --dependency = afterok:56761133 step2.sh When the 'transfer' job finishes successfully (without an error exit code) the 'processing' step will begin. While this is a simple dependency structure, it is possible to have multiple dependencies or more complicated structure. Job Clean-up One frequent use-case is a clean-up job that runs after all other jobs have finished. This is a common way to collect results from processing multiple files into a single output file. This can be done using the --dependency=singleton: flag that will wait until all previously launched jobs with the same name and user have finished. [ tl397@grace1 ~ ] $ squeue -u tl397 JOBID PARTITION NAME USER ST SUBMIT_TIME NODELIST ( REASON ) 12345670 day JobName tl397 R 2020 -05-27T11:54 c01n08 12345671 day JobName tl397 R 2020 -05-27T11:54 c01n08 ... 12345678 day JobName tl397 R 2020 -05-27T11:54 c01n08 12345679 day JobName tl397 R 2020 -05-27T11:54 c01n08 [ tl397@grace1 ~ ] $ sbatch --dependency = singleton --job-name = JobName cleanup.sh [ tl397@grace1 ~ ] $ squeue -u tl397 JOBID PARTITION NAME USER ST SUBMIT_TIME NODELIST ( REASON ) 12345670 day JobName tl397 R 2020 -05-27T11:54 c01n08 12345671 day JobName tl397 R 2020 -05-27T11:54 c01n08 ... 12345678 day JobName tl397 R 2020 -05-27T11:54 c01n08 12345679 day JobName tl397 R 2020 -05-27T11:54 c01n08 12345680 day JobName tl397 R 2020 -05-27T11:54 ( Dependency ) This last job will wait to run until all previous jobs with name JobName finish. Further Reading SLURM provides a number of options for logic controlling dependencies. Most common are the two discussed above, but --dependency=afternotok: can be useful to control behavior if a job fails. Full discussion of the options can be found on the SLURM manual page for sbatch (https://slurm.schedmd.com/sbatch.html). A very detailed overview, with examples in both bash and python, can also be found at the NIH computing reference: https://hpc.nih.gov/docs/job_dependencies.html.","title":"Jobs with Dependencies"},{"location":"clusters-at-yale/job-scheduling/dependency/#jobs-with-dependencies","text":"SLURM offers a tool which can help string jobs together via dependencies. When submitting a job, you can specify that it should wait to run until a specified job has finished. This provides a mechanism to create simple pipelines for managing complicated workflows.","title":"Jobs with Dependencies"},{"location":"clusters-at-yale/job-scheduling/dependency/#simple-pipeline","text":"As a toy example, consider a two-step pipeline, first a data transfer followed by an analysis step. Here we will use the --dependency flag for sbatch and the afterok type that requires a job to finish successfully before starting the second step: The first step is controlled by a sbatch submission script called step1.sh : #!/bin/bash #SBATCH --job-name=DataTransfer #SBATCH -t 30:00 rsync -avP remote_host:/path/to/data.csv $HOME /project/ The second step is controlled by step2.sh : #!/bin/bash #SBATCH --job-name=DataProcess #SBATCH -t 5:00:00 module load miniconda source activate my_env python my_script.py $HOME /project/data.csv When we submit the first step (using the command sbatch step1.sh ) we obtain the jobid number for that job. We then submit the second step adding in the --dependency flag to tell Slurm that this job requires the first job to finish before it can start: sbatch --dependency = afterok:56761133 step2.sh When the 'transfer' job finishes successfully (without an error exit code) the 'processing' step will begin. While this is a simple dependency structure, it is possible to have multiple dependencies or more complicated structure.","title":"Simple Pipeline"},{"location":"clusters-at-yale/job-scheduling/dependency/#job-clean-up","text":"One frequent use-case is a clean-up job that runs after all other jobs have finished. This is a common way to collect results from processing multiple files into a single output file. This can be done using the --dependency=singleton: flag that will wait until all previously launched jobs with the same name and user have finished. [ tl397@grace1 ~ ] $ squeue -u tl397 JOBID PARTITION NAME USER ST SUBMIT_TIME NODELIST ( REASON ) 12345670 day JobName tl397 R 2020 -05-27T11:54 c01n08 12345671 day JobName tl397 R 2020 -05-27T11:54 c01n08 ... 12345678 day JobName tl397 R 2020 -05-27T11:54 c01n08 12345679 day JobName tl397 R 2020 -05-27T11:54 c01n08 [ tl397@grace1 ~ ] $ sbatch --dependency = singleton --job-name = JobName cleanup.sh [ tl397@grace1 ~ ] $ squeue -u tl397 JOBID PARTITION NAME USER ST SUBMIT_TIME NODELIST ( REASON ) 12345670 day JobName tl397 R 2020 -05-27T11:54 c01n08 12345671 day JobName tl397 R 2020 -05-27T11:54 c01n08 ... 12345678 day JobName tl397 R 2020 -05-27T11:54 c01n08 12345679 day JobName tl397 R 2020 -05-27T11:54 c01n08 12345680 day JobName tl397 R 2020 -05-27T11:54 ( Dependency ) This last job will wait to run until all previous jobs with name JobName finish.","title":"Job Clean-up"},{"location":"clusters-at-yale/job-scheduling/dependency/#further-reading","text":"SLURM provides a number of options for logic controlling dependencies. Most common are the two discussed above, but --dependency=afternotok: can be useful to control behavior if a job fails. Full discussion of the options can be found on the SLURM manual page for sbatch (https://slurm.schedmd.com/sbatch.html). A very detailed overview, with examples in both bash and python, can also be found at the NIH computing reference: https://hpc.nih.gov/docs/job_dependencies.html.","title":"Further Reading"},{"location":"clusters-at-yale/job-scheduling/dsq/","text":"Job Arrays with dSQ Dead Simple Queue is a light-weight tool to help submit large batches of homogenous jobs to a Slurm -based HPC cluster. It wraps around slurm's sbatch to help you submit independent jobs as job arrays . Job arrays have several advantages over submitting your jobs in a loop: Your job array will grow during the run to use available resources, up to a limit you can set. Even if the cluster is busy, you probably get work done because each job from your array can be run independently. Your job will only use the resources needed to complete remaining jobs. It will shrink as your jobs finish, giving you and your peers better access to compute resources. If you run your array on a pre-emptable partition (scavenge on YCRC clusters), only individual jobs are preempted. Your whole array will continue. dSQ adds a few nice features on top of job arrays: Your jobs don't need to know they're running in an array; your job file is a great way to document what was done in a way that you can move to other systems relatively easily. You get a simple report of which job ran where and for how long dSQAutopsy can create a new job file that has only the jobs that didn't complete from your last run. All you need is Python 2.7+, or Python 3. dSQ is not recommended for situations where the initialization of the job takes most of its execution time and it is re-usable. These situations are much better handled by a worker-based job handler. Step 1: Create Your Job File First, you'll need to generate a job file. Each line of this job file needs to specify exactly what you want run for each job, including any modules that need to be loaded or modifications to your environment variables. Empty lines or lines that begin with # will be ignored when submitting your job array. Note: slurm jobs start in the directory from which your job was submitted. For example, imagine that you have 1000 fastq files that correspond to individual samples you want to map to a genome with bowtie2 and convert to bam files with samtools . Given some initial testing, you think that each job needs 4 GiB of RAM, and will run in less than 20 minutes. Create a file with the jobs you want to run, one per line. A simple loop that prints your jobs should usually suffice. A job can be a simple command invocation, or a sequence of commands. You can call the job file anything, but for this example assume it's called \"joblist.txt\" and contains: module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample1 --rg SM:sample1 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample1.fastq - | samtools view -Shu - | samtools sort - sample1 module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample2 --rg SM:sample2 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample2.fastq - | samtools view -Shu - | samtools sort - sample2 ... module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample1000 --rg SM:sample1000 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample1000.fastq - | samtools view -Shu - | samtools sort - sample1000 Avoid Very Short Jobs When building your job file, please bundle very short jobs (less than a minute) such that each element of the job array will run for at least 10 minutes. You can do this by putting multiple tasks on a single line, separated by a ; . In the same vein, avoid jobs that simply check for a previous successful completion and then exit. See dSQAutopsy below for a way to completely avoid submitting these types of jobs. Our clusters are not tuned for extremely high throughput jobs. Therefore, large numbers of very short jobs put a lot of strain on both the scheduler, resulting in delays in scheduling other users' jobs, and the storage, due to large numbers of I/O operations. Step 2: Generate Batch Script with dsq On YCRC clusters you can load Dead Simple Queue onto your path with: module load dSQ You can also download or clone this repo and use the scripts directly. dsq takes a few arguments, then writes a job submission script (default) or can directly submit a job for you. The resources you request will be given to each job in the array (each line in your job file) , e.g. requesting 2 GiB of RAM with dSQ will run each individual job with a separate 2 GiB of RAM available. Run sbatch --help or see the official Slurm documentation for more info on sbatch options. dSQ will set a default job name of dsq-jobfile (your job file name without the file extension). dSQ will also set the job output file name pattern to dsq-jobfile-%A_%a-%N.out, which will capture each of your jobs' output to a file with the job's ID(%A), its array index or zero-based line number(%a), and the host name of the node it ran on (%N). If you are handling output in each of your jobs, set this to /dev/null , which will stop these files from being created. Required Arguments: --job-file jobs.txt Job file, one self-contained job per line. Optional Arguments: -h, --help Show this help message and exit. --version show program's version number and exit --batch-file sub_script.sh Name for batch script file. Defaults to dsq-jobfile-YYYY-MM-DD.sh -J jobname, --job-name jobname Name of your job array. Defaults to dsq-jobfile --max-jobs number Maximum number of simultaneously running jobs from the job array. -o fmt_string, --output fmt_string Slurm output file pattern. There will be one file per line in your job file. To suppress slurm out files, set this to /dev/null. Defaults to dsq-jobfile-%A_%a-%N.out --status-dir dir Directory to save the job_jobid_status.tsv file to. Defaults to working directory. --suppress-stats-file Don't save job stats to job_jobid_status.tsv --submit Submit the job array on the fly instead of creating a submission script. In the example above, we want walltime of 20 minutes and memory=4GiB per job. Our invocation would be: dsq --job-file joblist.txt --mem-per-cpu 4g -t 20 :00 --mail-type ALL The dsq command will create a file called dsq-joblist-yyyy-mm-dd.sh , where the y, m, and d are today's date. After creating the batch script, take a look at its contents. You can further modify the Slurm directives in this file before submitting. #!/bin/bash #SBATCH --array 0-999 #SBATCH --output dsq-joblist-%A_%3a-%N.out #SBATCH --job-name dsq-joblist #SBATCH --mem-per-cpu 4g -t 10:00 --mail-type ALL # DO NOT EDIT LINE BELOW /path/to/dSQBatch.py --job-file /path/to/joblist.txt --status-dir /path/to/here Step 3: Submit Batch Script sbatch dsq-joblist-yyyy-mm-dd.sh Manage Your dSQ Job You can refer to any portion of your job with jobid_index syntax, or the entire array with its jobid. The index Dead Simple Queue uses starts at zero , so the 3rd line in your job file will have an index of 2. You can also specify ranges. # to cancel job 4 for array job 14567 scancel 14567_4 # to cancel jobs 10-20 for job 14567: scancel 14567_ [ 10 -20 ] dSQ Output You can monitor the status of your jobs in Slurm by using squeue -u , squeue -j , or dsqa -j . dSQ creates a file named job_jobid_status.tsv , unless you suppress this output with --supress-stats-file . This file will report the success or failure of each job as it finishes. Note this file will not contain information for any jobs that were canceled (e.g. by the user with scancel) before they began. This file contains details about the completed jobs in the following tab-separated columns: Job_ID: the zero-based line number from your job file. Exit_Code: exit code returned from your job (non-zero number generally indicates a failed job). Hostname: The hostname of the compute node that this job ran on. Time_Started: time started, formatted as year-month-day hour:minute:second. Time_Ended: time started, formatted as year-month-day hour:minute:second. Time_Elapsed: in seconds. Job: the line from your job file. dSQAutopsy You can use dSQAutopsy or dsqa to create a simple report of the array of jobs, and a new jobsfile that contains just the jobs you want to re-run if you specify the original jobsfile. Options listed below -j JOB_ID, --job-id JOB_ID The Job ID of a running or completed dSQ Array -f JOB_FILE, --job-file JOB_FILE Job file, one job per line (not your job submission script). -s STATES, --states STATES Comma separated list of states to use for re-writing job file. Default: CANCELLED,NODE_FAIL,PREEMPTED Asking for a simple report: dsqa -j 13233846 Produces one State Summary for Array 13233846 State Num_Jobs Indices ----- -------- ------- COMPLETED 12 4,7-17 RUNNING 5 1-3,5-6 PREEMPTED 1 0 You can redirect the report and the failed jobs to separate files: dsqa -j 2629186 -f jobsfile.txt > re-run_jobs.txt 2 > 2629186_report.txt","title":"Job Arrays with dSQ"},{"location":"clusters-at-yale/job-scheduling/dsq/#job-arrays-with-dsq","text":"Dead Simple Queue is a light-weight tool to help submit large batches of homogenous jobs to a Slurm -based HPC cluster. It wraps around slurm's sbatch to help you submit independent jobs as job arrays . Job arrays have several advantages over submitting your jobs in a loop: Your job array will grow during the run to use available resources, up to a limit you can set. Even if the cluster is busy, you probably get work done because each job from your array can be run independently. Your job will only use the resources needed to complete remaining jobs. It will shrink as your jobs finish, giving you and your peers better access to compute resources. If you run your array on a pre-emptable partition (scavenge on YCRC clusters), only individual jobs are preempted. Your whole array will continue. dSQ adds a few nice features on top of job arrays: Your jobs don't need to know they're running in an array; your job file is a great way to document what was done in a way that you can move to other systems relatively easily. You get a simple report of which job ran where and for how long dSQAutopsy can create a new job file that has only the jobs that didn't complete from your last run. All you need is Python 2.7+, or Python 3. dSQ is not recommended for situations where the initialization of the job takes most of its execution time and it is re-usable. These situations are much better handled by a worker-based job handler.","title":"Job Arrays with dSQ"},{"location":"clusters-at-yale/job-scheduling/dsq/#step-1-create-your-job-file","text":"First, you'll need to generate a job file. Each line of this job file needs to specify exactly what you want run for each job, including any modules that need to be loaded or modifications to your environment variables. Empty lines or lines that begin with # will be ignored when submitting your job array. Note: slurm jobs start in the directory from which your job was submitted. For example, imagine that you have 1000 fastq files that correspond to individual samples you want to map to a genome with bowtie2 and convert to bam files with samtools . Given some initial testing, you think that each job needs 4 GiB of RAM, and will run in less than 20 minutes. Create a file with the jobs you want to run, one per line. A simple loop that prints your jobs should usually suffice. A job can be a simple command invocation, or a sequence of commands. You can call the job file anything, but for this example assume it's called \"joblist.txt\" and contains: module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample1 --rg SM:sample1 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample1.fastq - | samtools view -Shu - | samtools sort - sample1 module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample2 --rg SM:sample2 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample2.fastq - | samtools view -Shu - | samtools sort - sample2 ... module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample1000 --rg SM:sample1000 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample1000.fastq - | samtools view -Shu - | samtools sort - sample1000 Avoid Very Short Jobs When building your job file, please bundle very short jobs (less than a minute) such that each element of the job array will run for at least 10 minutes. You can do this by putting multiple tasks on a single line, separated by a ; . In the same vein, avoid jobs that simply check for a previous successful completion and then exit. See dSQAutopsy below for a way to completely avoid submitting these types of jobs. Our clusters are not tuned for extremely high throughput jobs. Therefore, large numbers of very short jobs put a lot of strain on both the scheduler, resulting in delays in scheduling other users' jobs, and the storage, due to large numbers of I/O operations.","title":"Step 1: Create Your Job File"},{"location":"clusters-at-yale/job-scheduling/dsq/#step-2-generate-batch-script-with-dsq","text":"On YCRC clusters you can load Dead Simple Queue onto your path with: module load dSQ You can also download or clone this repo and use the scripts directly. dsq takes a few arguments, then writes a job submission script (default) or can directly submit a job for you. The resources you request will be given to each job in the array (each line in your job file) , e.g. requesting 2 GiB of RAM with dSQ will run each individual job with a separate 2 GiB of RAM available. Run sbatch --help or see the official Slurm documentation for more info on sbatch options. dSQ will set a default job name of dsq-jobfile (your job file name without the file extension). dSQ will also set the job output file name pattern to dsq-jobfile-%A_%a-%N.out, which will capture each of your jobs' output to a file with the job's ID(%A), its array index or zero-based line number(%a), and the host name of the node it ran on (%N). If you are handling output in each of your jobs, set this to /dev/null , which will stop these files from being created. Required Arguments: --job-file jobs.txt Job file, one self-contained job per line. Optional Arguments: -h, --help Show this help message and exit. --version show program's version number and exit --batch-file sub_script.sh Name for batch script file. Defaults to dsq-jobfile-YYYY-MM-DD.sh -J jobname, --job-name jobname Name of your job array. Defaults to dsq-jobfile --max-jobs number Maximum number of simultaneously running jobs from the job array. -o fmt_string, --output fmt_string Slurm output file pattern. There will be one file per line in your job file. To suppress slurm out files, set this to /dev/null. Defaults to dsq-jobfile-%A_%a-%N.out --status-dir dir Directory to save the job_jobid_status.tsv file to. Defaults to working directory. --suppress-stats-file Don't save job stats to job_jobid_status.tsv --submit Submit the job array on the fly instead of creating a submission script. In the example above, we want walltime of 20 minutes and memory=4GiB per job. Our invocation would be: dsq --job-file joblist.txt --mem-per-cpu 4g -t 20 :00 --mail-type ALL The dsq command will create a file called dsq-joblist-yyyy-mm-dd.sh , where the y, m, and d are today's date. After creating the batch script, take a look at its contents. You can further modify the Slurm directives in this file before submitting. #!/bin/bash #SBATCH --array 0-999 #SBATCH --output dsq-joblist-%A_%3a-%N.out #SBATCH --job-name dsq-joblist #SBATCH --mem-per-cpu 4g -t 10:00 --mail-type ALL # DO NOT EDIT LINE BELOW /path/to/dSQBatch.py --job-file /path/to/joblist.txt --status-dir /path/to/here","title":"Step 2: Generate Batch Script with dsq"},{"location":"clusters-at-yale/job-scheduling/dsq/#step-3-submit-batch-script","text":"sbatch dsq-joblist-yyyy-mm-dd.sh","title":"Step 3: Submit Batch Script"},{"location":"clusters-at-yale/job-scheduling/dsq/#manage-your-dsq-job","text":"You can refer to any portion of your job with jobid_index syntax, or the entire array with its jobid. The index Dead Simple Queue uses starts at zero , so the 3rd line in your job file will have an index of 2. You can also specify ranges. # to cancel job 4 for array job 14567 scancel 14567_4 # to cancel jobs 10-20 for job 14567: scancel 14567_ [ 10 -20 ]","title":"Manage Your dSQ Job"},{"location":"clusters-at-yale/job-scheduling/dsq/#dsq-output","text":"You can monitor the status of your jobs in Slurm by using squeue -u , squeue -j , or dsqa -j . dSQ creates a file named job_jobid_status.tsv , unless you suppress this output with --supress-stats-file . This file will report the success or failure of each job as it finishes. Note this file will not contain information for any jobs that were canceled (e.g. by the user with scancel) before they began. This file contains details about the completed jobs in the following tab-separated columns: Job_ID: the zero-based line number from your job file. Exit_Code: exit code returned from your job (non-zero number generally indicates a failed job). Hostname: The hostname of the compute node that this job ran on. Time_Started: time started, formatted as year-month-day hour:minute:second. Time_Ended: time started, formatted as year-month-day hour:minute:second. Time_Elapsed: in seconds. Job: the line from your job file.","title":"dSQ Output"},{"location":"clusters-at-yale/job-scheduling/dsq/#dsqautopsy","text":"You can use dSQAutopsy or dsqa to create a simple report of the array of jobs, and a new jobsfile that contains just the jobs you want to re-run if you specify the original jobsfile. Options listed below -j JOB_ID, --job-id JOB_ID The Job ID of a running or completed dSQ Array -f JOB_FILE, --job-file JOB_FILE Job file, one job per line (not your job submission script). -s STATES, --states STATES Comma separated list of states to use for re-writing job file. Default: CANCELLED,NODE_FAIL,PREEMPTED Asking for a simple report: dsqa -j 13233846 Produces one State Summary for Array 13233846 State Num_Jobs Indices ----- -------- ------- COMPLETED 12 4,7-17 RUNNING 5 1-3,5-6 PREEMPTED 1 0 You can redirect the report and the failed jobs to separate files: dsqa -j 2629186 -f jobsfile.txt > re-run_jobs.txt 2 > 2629186_report.txt","title":"dSQAutopsy"},{"location":"clusters-at-yale/job-scheduling/fairshare/","text":"Priority & Wait Time Job Priority Score Fairshare To ensure well-balanced access to cluster resources, we institute a fairshare system on our clusters. In practice this means jobs have a priority score that dictates when it can be run in relation to other jobs. This score is affected by the amount of CPU-equivalent hours used by a group in the past few weeks. The number of CPU-equivalents allocated to a job is defined as the larger of (a) the number of requested cores and (b) the total amount of requested memory divided by the default memory per core (usually 5G/core). If a group has used a large amount of CPU-equivalent hours, their jobs are given a lower priority score and therefore will take longer to start if the cluster is busy. Regardless of a job's prority, the scheduler still considers all jobs for backfill (see below). To see all pending jobs sorted by priority (jobs with higher priority at the top), use the following squeue command: squeue --sort=-p -t PD -p To monitor usage of members of your group, run the sshare command: sshare -a -A Note: Resources used on private partitions do not count affect fairshare. Similarly, resources used in the scavenge partition cost 10% of comparable resources in the other partitions. Length of Time in Queue In addition to fairshare, any pending job will accrue priority over time, which can help overcome small fairshare penalties. To see the factors affecting your job's priority, run the following sprio command: sprio -j Backfill In addition to the main scheduling cycle, where jobs are run in the order of priority and availability of resources, all jobs are also considered for \"backfill\". Backfill is a mechanism which will let jobs with lower priority score start before high priority jobs if they can fit in around them. For example, if a higher priority job needs 4 nodes with 20 cores on each node and it will have to wait 30 hours for those resources to be available, if a lower priority job only needs a couple cores for an hour, Slurm will run that job in the meantime. For this reason, it is important to request accurate walltime limits for your jobs. If your job only requires 2 hours to run, but you request 24 hours, the likelihood that your job will be backfilled is greatly lowered. Moreover, for performance reasons, the backfill scheduler on Grace only looks at the top 10 jobs by each user. Therefore, if you bundle similar jobs into job arrays (see dSQ ), the backfill cycle will consider more of your jobs since entire job arrays only count as one job for the limit accounting.","title":"Priority & Wait Time"},{"location":"clusters-at-yale/job-scheduling/fairshare/#priority-wait-time","text":"","title":"Priority & Wait Time"},{"location":"clusters-at-yale/job-scheduling/fairshare/#job-priority-score","text":"","title":"Job Priority Score"},{"location":"clusters-at-yale/job-scheduling/fairshare/#fairshare","text":"To ensure well-balanced access to cluster resources, we institute a fairshare system on our clusters. In practice this means jobs have a priority score that dictates when it can be run in relation to other jobs. This score is affected by the amount of CPU-equivalent hours used by a group in the past few weeks. The number of CPU-equivalents allocated to a job is defined as the larger of (a) the number of requested cores and (b) the total amount of requested memory divided by the default memory per core (usually 5G/core). If a group has used a large amount of CPU-equivalent hours, their jobs are given a lower priority score and therefore will take longer to start if the cluster is busy. Regardless of a job's prority, the scheduler still considers all jobs for backfill (see below). To see all pending jobs sorted by priority (jobs with higher priority at the top), use the following squeue command: squeue --sort=-p -t PD -p To monitor usage of members of your group, run the sshare command: sshare -a -A Note: Resources used on private partitions do not count affect fairshare. Similarly, resources used in the scavenge partition cost 10% of comparable resources in the other partitions.","title":"Fairshare"},{"location":"clusters-at-yale/job-scheduling/fairshare/#length-of-time-in-queue","text":"In addition to fairshare, any pending job will accrue priority over time, which can help overcome small fairshare penalties. To see the factors affecting your job's priority, run the following sprio command: sprio -j ","title":"Length of Time in Queue"},{"location":"clusters-at-yale/job-scheduling/fairshare/#backfill","text":"In addition to the main scheduling cycle, where jobs are run in the order of priority and availability of resources, all jobs are also considered for \"backfill\". Backfill is a mechanism which will let jobs with lower priority score start before high priority jobs if they can fit in around them. For example, if a higher priority job needs 4 nodes with 20 cores on each node and it will have to wait 30 hours for those resources to be available, if a lower priority job only needs a couple cores for an hour, Slurm will run that job in the meantime. For this reason, it is important to request accurate walltime limits for your jobs. If your job only requires 2 hours to run, but you request 24 hours, the likelihood that your job will be backfilled is greatly lowered. Moreover, for performance reasons, the backfill scheduler on Grace only looks at the top 10 jobs by each user. Therefore, if you bundle similar jobs into job arrays (see dSQ ), the backfill cycle will consider more of your jobs since entire job arrays only count as one job for the limit accounting.","title":"Backfill"},{"location":"clusters-at-yale/job-scheduling/mpi/","text":"MPI Partition Grace has a special common partition called mpi . The mpi partition is a bit different from other partitions on Grace--it always allocates entire nodes to jobs submitted to the partition. Each node in the mpi partition are identical 24 core, 2x Skylake Gold 6136, 96GiB RAM (90GiB usable) nodes. While this partition is available to all Grace users, only certain types of jobs are allowed on the partition (similar to the restrictions on our GPU partitions). In addition the the common partition mpi , there is a scavenge_mpi partition. This partition is has the same purpose and limitations as the regular mpi partition, but allows users to run a lower priority (e.g. subject to preemption if nodes are requested in the mpi partition ) without incurring cpu charges. Appropriate Jobs This partition is specifically designed to support jobs that use tightly-coupled MPI-enabled applications that will run across multiple nodes and are sensitive to sharing their nodes with other jobs. Since every node on the mpi partition is identical, it can support workloads that are sensitive to hardware difference across a single job. We expect most of jobs submitted to mpi to use all 24 cores on each node. There are occasionally instances where a tightly coupled application will use multiple nodes but less than all 24 cores due to load balancing or memory limitations. For example, some applications require power of 2 cores in the job, but 24 cores doesn't always divide evenly into those configurations. So we occasionally see jobs that use multiple nodes but only 16 of the 24 cores per node and are also acceptable submissions to the mpi partition. Jobs that do not require exclusive nodes, even if they use mpirun to launch, will run fine and experience normal wait times in the day and week (and scavenge) partitions. As such, we ask you to protect the special mpi partition nodes for the more resource sensitive jobs listed above and, therefore, submit any jobs that will not be using whole node(s) to the other partitions. If smaller or single core jobs are submitted to the mpi partition, they may be cancelled without warning. As with our GPU partitions, if you would like to make use of available cores on any mpi nodes for small jobs, the scavenge partition is the correct way to do that. If you have any questions about whether your workload is appropriate for the mpi partition, please contact us . Compilation There is one node in the devel partition that is identical to the mpi partition nodes. If you choose to compile your code with advanced optimization flags specific to the new generation of compute nodes, you can request that node in the devel partition with the -C skylake submission flag. Core Layouts Please review the Request Compute Resources documentation for the appropriate Slurm flags for different types of core and node layouts. If you have any questions, feel free to contact us .","title":"MPI Partition"},{"location":"clusters-at-yale/job-scheduling/mpi/#mpi-partition","text":"Grace has a special common partition called mpi . The mpi partition is a bit different from other partitions on Grace--it always allocates entire nodes to jobs submitted to the partition. Each node in the mpi partition are identical 24 core, 2x Skylake Gold 6136, 96GiB RAM (90GiB usable) nodes. While this partition is available to all Grace users, only certain types of jobs are allowed on the partition (similar to the restrictions on our GPU partitions). In addition the the common partition mpi , there is a scavenge_mpi partition. This partition is has the same purpose and limitations as the regular mpi partition, but allows users to run a lower priority (e.g. subject to preemption if nodes are requested in the mpi partition ) without incurring cpu charges.","title":"MPI Partition"},{"location":"clusters-at-yale/job-scheduling/mpi/#appropriate-jobs","text":"This partition is specifically designed to support jobs that use tightly-coupled MPI-enabled applications that will run across multiple nodes and are sensitive to sharing their nodes with other jobs. Since every node on the mpi partition is identical, it can support workloads that are sensitive to hardware difference across a single job. We expect most of jobs submitted to mpi to use all 24 cores on each node. There are occasionally instances where a tightly coupled application will use multiple nodes but less than all 24 cores due to load balancing or memory limitations. For example, some applications require power of 2 cores in the job, but 24 cores doesn't always divide evenly into those configurations. So we occasionally see jobs that use multiple nodes but only 16 of the 24 cores per node and are also acceptable submissions to the mpi partition. Jobs that do not require exclusive nodes, even if they use mpirun to launch, will run fine and experience normal wait times in the day and week (and scavenge) partitions. As such, we ask you to protect the special mpi partition nodes for the more resource sensitive jobs listed above and, therefore, submit any jobs that will not be using whole node(s) to the other partitions. If smaller or single core jobs are submitted to the mpi partition, they may be cancelled without warning. As with our GPU partitions, if you would like to make use of available cores on any mpi nodes for small jobs, the scavenge partition is the correct way to do that. If you have any questions about whether your workload is appropriate for the mpi partition, please contact us .","title":"Appropriate Jobs"},{"location":"clusters-at-yale/job-scheduling/mpi/#compilation","text":"There is one node in the devel partition that is identical to the mpi partition nodes. If you choose to compile your code with advanced optimization flags specific to the new generation of compute nodes, you can request that node in the devel partition with the -C skylake submission flag.","title":"Compilation"},{"location":"clusters-at-yale/job-scheduling/mpi/#core-layouts","text":"Please review the Request Compute Resources documentation for the appropriate Slurm flags for different types of core and node layouts. If you have any questions, feel free to contact us .","title":"Core Layouts"},{"location":"clusters-at-yale/job-scheduling/resource-requests/","text":"Request Compute Resources Request Cores and Nodes When running jobs with Slurm , you must be explicit about requesting CPU cores and nodes. See our page on monitoring usage for tips on verifying your jobs are using the resources you expect. The three options --nodes or -N , --ntasks or -n , and --cpus-per-task or -c can be a bit confusing at first but are necessary to understand for applications that use more than one CPU. Tip If your application references threads or cores but makes no mention of MPI, only use --cpus-per-task to request CPUs. You cannot request more cores than there are on a single compute node where your job runs. Multi-thread, Multi-process, and MPI The majority of applications in the world were written to use one or more cores on a single computer. Most can only use one core, and do not benefit from being given more cores. The best way to speed these applications up is to run many separate jobs at once, using Dead Simple Queue or job arrays . If an application is able to use multiple cores, it usually achieves this by either spawning threads and sharing memory (multi-threaded) or starting entire new processes (multi-process). Some applications are written to use the Message Passing Interface (MPI) standard to run across many compute nodes. This allows such applications to scale computation in a way not limited by the number of cores on a single node. MPI translates what Slurm calls tasks to separate workers or processes. Because each of these processes can communicate across compute nodes, Slurm does not constrain them to the same node by default. Though tasks can be distributed across nodes, Slurm will not split the CPUs allocated to individual tasks. For this reason a single task that has multiple CPUs allocated will always be on a single node. In some cases using --ntasks=4 (or -n 4 ) and --cpus-per-task=4 (or -c 4 ) achieves the same job allocation by luck, but you should only use --cpus-per-task when using non-MPI applications to guarantee that the CPUs you expect your program to use are all accessable. Some MPI programs are also multi-threaded, so each process can use multiple CPUs. Only these applications can use --ntasks and --cpus-per-task to run faster. MPI Applications For more control over how Slurm lays out your job, you can add the --nodes and --ntasks-per-node flags. --nodes specifies how many nodes to allocate to your job. Slurm will allocate your requested number of cores to a minimal number of nodes on the cluster, so it is likely if you request a small number of tasks that they will all be allocated on the same node. However, to ensure they are on the same node, set --nodes=1 (obviously this is contingent on the number of CPUs on your cluster's nodes and requesting too many may result in a job that will never run). Conversely, if you would like to ensure a specific layout, such as one task per node for memory, I/O or other reasons, you can also set --ntasks-per-node=1 . Note that the following must be true: ntasks-per-node * nodes >= ntasks Hybrid (MPI+OpenMP) Applications For the most predictable performance for hybrid applications, you will need to use all three of the --ntasks , --cpus-per-task , and --nodes flags, where --ntasks equals the number of MPI tasks, --cpus-per-task equals the number of OMP_NUM_THREADS and --nodes is the number of nodes required to fit --ntasks * --cpus-per-task . Request Memory (RAM) Slurm strictly enforces the memory your job can use. If you request 5GiB of memory for your job and the total used by all processes you launch hits that limit, some of your processes may die and you will get errors . Make sure you either request the right amount of memory per core on each node in your job with --mem-per-cpu or memory per node in your job with --mem . You can request more memory than you think you might need for an example job, then make note of its actual usage to better tune future requests for similar jobs. Request GPUs Some of our clusters have nodes that contain GPU co-processors. Please refer to the individual cluster pages regarding node configurations that include GPUs. There are several salloc / sbatch options that allow you to request GPUs and specify your job layout relative to the GPUs requested. Long Option Short Option Description --cpus-per-gpu Use instead of --cpus-per-task to specify number of CPUs per allocated GPU. --gpus -G Specify the total number of GPUs required for the job either with number or type:number. --gpus-per-node Specify the number of GPUs per node , either with number or type:number. New option similar to --gres=gpu . --gpus-per-task Specify the number of GPUs per task , either with number or type:number. --mem-per-gpu * Request system memory that scales per GPU. The --mem , --mem-per-cpu and --mem-per-gpu options are mutually exclusive --constraint -C Request a selection of GPU types (separate types with | ). This option requires the --gpus option for GPU selection. * The --mem-per-gpu flag does not currently work as intended, please do not use. Request memory using --mem or --mem-per-cpu in the meantime. In order for your job to be able to access gpus, you must submit your job to a partition that contains nodes with GPUs and request them - the default GPU request for jobs is to not request any . Some applications require double-precision capable GPUs. If yours does, see the next section for using \"features\" to request any node with compatible GPUs. The Slurm options --mem , --mem-per-gpu and --mem-per-cpu do not request memory on GPUs, sometimes called vRAM. Instead you are allocated the GPU(s) requested and all attached GPU memory for your jobs. Memory accessible on GPUs is limited by their model, and is also listed on each cluster page. Request Specific GPU Types If your job can only run on a subset of the GPU types available in the partition, you can request one or more specific types of GPUs. To request a specific type of GPU, use type:number notation. For example, to request an NVIDIA P100. sbatch --cpus-per-gpu=2 --gpus=p100:1 --time=6:00:00 --partition gpu my_gpu_job.sh To submit your job to a number of GPU options (such as NVIDIA P100, V100 or A100), use a combination of the constraint flag ( -C ) and the --gpus flag (with just a number). For the constraint flag , separate the different GPU type names with the pipe character ( | ). Your job will then start on a node with any of those GPU types. This is not guaranteed to work as expected if you are requesting multiple nodes. GPU type names can be found in the partition tables on each respective cluster page. sbatch -C \"p100|v100|a100\" --gpus=1 --time=6:00:00 --partition gpu my_gpu_job.sh Tip As with requesting multiple cores or multiple nodes, we strongly recommend that you test your jobs using the gpu_devel partition to make sure they can well utilize multiple GPUs before requesting them; allocating more GPUs does not speed up code that can only use one at a time. Here is an example interactive request that would allocate two GPUs and four CPUs for thirty minutes: salloc --cpus-per-gpu=2 --gpus=2 --time=30:00 --partition gpu_devel For more documentation on using GPUs on our clusters, please see GPUs and CUDA . Features and Constraints You may want to run programs that require specific hardware. To ensure your job runs on specific types of nodes, use the --constraint flag. You can use the processor codename (e.g. haswell ) or processor type (e.g. E5-2660_v3 ) to limit your job to specific node types. You can also specify an instruction set (e.g. avx512 ) to require that no matter what CPU your job runs on, it must understand at least these instructions. See the individual cluster pages for the exact tags for the different node types. Multiple requirements (\"AND\") are separated by a comma ( , ) and multiple options (\"OR\") should be separated by the pipe character ( | ). # run on a node with a haswell codenamed CPU (e.g. a E5-2660 v3) sbatch --constraint = haswell submit.sh # only run on nodes with E5-2660 v4 CPUs sbatch --constraint = E5-2660_v4 submit.sh We also have keyword features to help you constrain your jobs to certain categories of nodes. oldest : the oldest generation of node on the cluster. Use this constraint when compiling code if you wish to ensure it can run on any standard node on the cluster. nogpu : nodes without GPUs. standard : nodes without GPUs or extra memory. Useful for protecting special nodes in a private partition for jobs that can use the extra capabilities. singleprecision : nodes with single-precision only capable GPUs (e.g. GTX 1080s, RTX 2080s). doubleprecision : nodes with double-precision capable GPUs (e.g. K80s, P100s and V100s). GPU type (e.g. v100 ): nodes with a specific type of GPU. bigtmp : nodes with at least 1.5T of local storage in /tmp . Useful to ensure that your code will have sufficient space if it uses local storage (e.g. Gaussian's $GAUSS_SCRDIR ). Tip Use the command scontrol show node , replacing with the node's name you're interested in, to see more information about the node including its features.","title":"Request Compute Resources"},{"location":"clusters-at-yale/job-scheduling/resource-requests/#request-compute-resources","text":"","title":"Request Compute Resources"},{"location":"clusters-at-yale/job-scheduling/resource-requests/#request-cores-and-nodes","text":"When running jobs with Slurm , you must be explicit about requesting CPU cores and nodes. See our page on monitoring usage for tips on verifying your jobs are using the resources you expect. The three options --nodes or -N , --ntasks or -n , and --cpus-per-task or -c can be a bit confusing at first but are necessary to understand for applications that use more than one CPU. Tip If your application references threads or cores but makes no mention of MPI, only use --cpus-per-task to request CPUs. You cannot request more cores than there are on a single compute node where your job runs.","title":"Request Cores and Nodes"},{"location":"clusters-at-yale/job-scheduling/resource-requests/#multi-thread-multi-process-and-mpi","text":"The majority of applications in the world were written to use one or more cores on a single computer. Most can only use one core, and do not benefit from being given more cores. The best way to speed these applications up is to run many separate jobs at once, using Dead Simple Queue or job arrays . If an application is able to use multiple cores, it usually achieves this by either spawning threads and sharing memory (multi-threaded) or starting entire new processes (multi-process). Some applications are written to use the Message Passing Interface (MPI) standard to run across many compute nodes. This allows such applications to scale computation in a way not limited by the number of cores on a single node. MPI translates what Slurm calls tasks to separate workers or processes. Because each of these processes can communicate across compute nodes, Slurm does not constrain them to the same node by default. Though tasks can be distributed across nodes, Slurm will not split the CPUs allocated to individual tasks. For this reason a single task that has multiple CPUs allocated will always be on a single node. In some cases using --ntasks=4 (or -n 4 ) and --cpus-per-task=4 (or -c 4 ) achieves the same job allocation by luck, but you should only use --cpus-per-task when using non-MPI applications to guarantee that the CPUs you expect your program to use are all accessable. Some MPI programs are also multi-threaded, so each process can use multiple CPUs. Only these applications can use --ntasks and --cpus-per-task to run faster.","title":"Multi-thread, Multi-process, and MPI"},{"location":"clusters-at-yale/job-scheduling/resource-requests/#mpi-applications","text":"For more control over how Slurm lays out your job, you can add the --nodes and --ntasks-per-node flags. --nodes specifies how many nodes to allocate to your job. Slurm will allocate your requested number of cores to a minimal number of nodes on the cluster, so it is likely if you request a small number of tasks that they will all be allocated on the same node. However, to ensure they are on the same node, set --nodes=1 (obviously this is contingent on the number of CPUs on your cluster's nodes and requesting too many may result in a job that will never run). Conversely, if you would like to ensure a specific layout, such as one task per node for memory, I/O or other reasons, you can also set --ntasks-per-node=1 . Note that the following must be true: ntasks-per-node * nodes >= ntasks","title":"MPI Applications"},{"location":"clusters-at-yale/job-scheduling/resource-requests/#hybrid-mpiopenmp-applications","text":"For the most predictable performance for hybrid applications, you will need to use all three of the --ntasks , --cpus-per-task , and --nodes flags, where --ntasks equals the number of MPI tasks, --cpus-per-task equals the number of OMP_NUM_THREADS and --nodes is the number of nodes required to fit --ntasks * --cpus-per-task .","title":"Hybrid (MPI+OpenMP) Applications"},{"location":"clusters-at-yale/job-scheduling/resource-requests/#request-memory-ram","text":"Slurm strictly enforces the memory your job can use. If you request 5GiB of memory for your job and the total used by all processes you launch hits that limit, some of your processes may die and you will get errors . Make sure you either request the right amount of memory per core on each node in your job with --mem-per-cpu or memory per node in your job with --mem . You can request more memory than you think you might need for an example job, then make note of its actual usage to better tune future requests for similar jobs.","title":"Request Memory (RAM)"},{"location":"clusters-at-yale/job-scheduling/resource-requests/#request-gpus","text":"Some of our clusters have nodes that contain GPU co-processors. Please refer to the individual cluster pages regarding node configurations that include GPUs. There are several salloc / sbatch options that allow you to request GPUs and specify your job layout relative to the GPUs requested. Long Option Short Option Description --cpus-per-gpu Use instead of --cpus-per-task to specify number of CPUs per allocated GPU. --gpus -G Specify the total number of GPUs required for the job either with number or type:number. --gpus-per-node Specify the number of GPUs per node , either with number or type:number. New option similar to --gres=gpu . --gpus-per-task Specify the number of GPUs per task , either with number or type:number. --mem-per-gpu * Request system memory that scales per GPU. The --mem , --mem-per-cpu and --mem-per-gpu options are mutually exclusive --constraint -C Request a selection of GPU types (separate types with | ). This option requires the --gpus option for GPU selection. * The --mem-per-gpu flag does not currently work as intended, please do not use. Request memory using --mem or --mem-per-cpu in the meantime. In order for your job to be able to access gpus, you must submit your job to a partition that contains nodes with GPUs and request them - the default GPU request for jobs is to not request any . Some applications require double-precision capable GPUs. If yours does, see the next section for using \"features\" to request any node with compatible GPUs. The Slurm options --mem , --mem-per-gpu and --mem-per-cpu do not request memory on GPUs, sometimes called vRAM. Instead you are allocated the GPU(s) requested and all attached GPU memory for your jobs. Memory accessible on GPUs is limited by their model, and is also listed on each cluster page.","title":"Request GPUs"},{"location":"clusters-at-yale/job-scheduling/resource-requests/#request-specific-gpu-types","text":"If your job can only run on a subset of the GPU types available in the partition, you can request one or more specific types of GPUs. To request a specific type of GPU, use type:number notation. For example, to request an NVIDIA P100. sbatch --cpus-per-gpu=2 --gpus=p100:1 --time=6:00:00 --partition gpu my_gpu_job.sh To submit your job to a number of GPU options (such as NVIDIA P100, V100 or A100), use a combination of the constraint flag ( -C ) and the --gpus flag (with just a number). For the constraint flag , separate the different GPU type names with the pipe character ( | ). Your job will then start on a node with any of those GPU types. This is not guaranteed to work as expected if you are requesting multiple nodes. GPU type names can be found in the partition tables on each respective cluster page. sbatch -C \"p100|v100|a100\" --gpus=1 --time=6:00:00 --partition gpu my_gpu_job.sh Tip As with requesting multiple cores or multiple nodes, we strongly recommend that you test your jobs using the gpu_devel partition to make sure they can well utilize multiple GPUs before requesting them; allocating more GPUs does not speed up code that can only use one at a time. Here is an example interactive request that would allocate two GPUs and four CPUs for thirty minutes: salloc --cpus-per-gpu=2 --gpus=2 --time=30:00 --partition gpu_devel For more documentation on using GPUs on our clusters, please see GPUs and CUDA .","title":"Request Specific GPU Types"},{"location":"clusters-at-yale/job-scheduling/resource-requests/#features-and-constraints","text":"You may want to run programs that require specific hardware. To ensure your job runs on specific types of nodes, use the --constraint flag. You can use the processor codename (e.g. haswell ) or processor type (e.g. E5-2660_v3 ) to limit your job to specific node types. You can also specify an instruction set (e.g. avx512 ) to require that no matter what CPU your job runs on, it must understand at least these instructions. See the individual cluster pages for the exact tags for the different node types. Multiple requirements (\"AND\") are separated by a comma ( , ) and multiple options (\"OR\") should be separated by the pipe character ( | ). # run on a node with a haswell codenamed CPU (e.g. a E5-2660 v3) sbatch --constraint = haswell submit.sh # only run on nodes with E5-2660 v4 CPUs sbatch --constraint = E5-2660_v4 submit.sh We also have keyword features to help you constrain your jobs to certain categories of nodes. oldest : the oldest generation of node on the cluster. Use this constraint when compiling code if you wish to ensure it can run on any standard node on the cluster. nogpu : nodes without GPUs. standard : nodes without GPUs or extra memory. Useful for protecting special nodes in a private partition for jobs that can use the extra capabilities. singleprecision : nodes with single-precision only capable GPUs (e.g. GTX 1080s, RTX 2080s). doubleprecision : nodes with double-precision capable GPUs (e.g. K80s, P100s and V100s). GPU type (e.g. v100 ): nodes with a specific type of GPU. bigtmp : nodes with at least 1.5T of local storage in /tmp . Useful to ensure that your code will have sufficient space if it uses local storage (e.g. Gaussian's $GAUSS_SCRDIR ). Tip Use the command scontrol show node , replacing with the node's name you're interested in, to see more information about the node including its features.","title":"Features and Constraints"},{"location":"clusters-at-yale/job-scheduling/resource-usage/","text":"Monitor CPU and Memory General Note Making sure your jobs use the right amount of RAM and the right number of CPUs helps you and others using the clusters use these resources more effeciently, and in turn get work done more quickly. Below are some examples of how to measure your CPU and RAM (aka memory) usage so you can make this happen. Be sure to check the Slurm documentation and the clusters page (especially the partitions and hardware sections) to make sure you are submitting the right jobs to the right hardware. Future Jobs If you launch a program by putting /usr/bin/time in front of it, time will watch your program and provide statistics about the resources it used. For example: [ netid@node ~ ] $ /usr/bin/time -v stress-ng --cpu 8 --timeout 10s stress-ng: info: [ 32574 ] dispatching hogs: 8 cpu stress-ng: info: [ 32574 ] successful run completed in 10 .08s Command being timed: \"stress-ng --cpu 8 --timeout 10s\" User time ( seconds ) : 80 .22 System time ( seconds ) : 0 .04 Percent of CPU this job got: 795 % Elapsed ( wall clock ) time ( h:mm:ss or m:ss ) : 0 :10.09 Average shared text size ( kbytes ) : 0 Average unshared data size ( kbytes ) : 0 Average stack size ( kbytes ) : 0 Average total size ( kbytes ) : 0 Maximum resident set size ( kbytes ) : 6328 Average resident set size ( kbytes ) : 0 Major ( requiring I/O ) page faults: 0 Minor ( reclaiming a frame ) page faults: 30799 Voluntary context switches: 1380 Involuntary context switches: 68 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 To know how much RAM your job used (and what jobs like it will need in the future), look at the \"Maximum resident set size\" Running Jobs If your job is already running, you can check on its usage, but will have to wait until it has finished to find the maximum memory and CPU used. The easiest way to check the instantaneous memory and CPU usage of a job is to ssh to a compute node your job is running on. To find the node you should ssh to, run: [netid@node ~]$ squeue --me JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 21252409 general 12345 netid R 32:17 17 c13n[02-04],c14n[05-10],c16n[03-10] Then use ssh to connect to a node your job is running on from the NODELIST column: [netid@node ~]$ ssh c13n03 [netid@c13n03 ~]$ Once you are on the compute node, run either ps or top . ps ps will give you instantaneous usage every time you run it. Here is some sample ps output: [netid@bigmem01 ~]$ ps -u$USER -o %cpu,rss,args %CPU RSS COMMAND 92.6 79446140 /gpfs/ysm/apps/hpc/Apps/Matlab/R2016b/bin/glnxa64/MATLAB -dmlworker -nodisplay -r distcomp_evaluate_filetask 94.5 80758040 /gpfs/ysm/apps/hpc/Apps/Matlab/R2016b/bin/glnxa64/MATLAB -dmlworker -nodisplay -r distcomp_evaluate_filetask 92.6 79676460 /gpfs/ysm/apps/hpc/Apps/Matlab/R2016b/bin/glnxa64/MATLAB -dmlworker -nodisplay -r distcomp_evaluate_filetask 92.5 81243364 /gpfs/ysm/apps/hpc/Apps/Matlab/R2016b/bin/glnxa64/MATLAB -dmlworker -nodisplay -r distcomp_evaluate_filetask 93.8 80799668 /gpfs/ysm/apps/hpc/Apps/Matlab/R2016b/bin/glnxa64/MATLAB -dmlworker -nodisplay -r distcomp_evaluate_filetask ps reports memory used in kilobytes, so each of the 5 matlab processes is using ~77GiB of RAM. They are also using most of 5 cores, so future jobs like this should request 5 CPUs. top top runs interactively and shows you live usage statistics. You can press u , enter your netid, then enter to filter just your processes. For Memory usage, the number you are interested in is RES. In the case below, the YEPNEE.exe programs are each consuming ~600MB of memory and each fully utilizing one CPU. You can press ? for help and q to quit. ClusterShell For multi-node jobs clush can be very useful. Please see our guide on how to set up and use ClusterShell . Completed Jobs Slurm records statistics for every job, including how much memory and CPU was used. seff After the job completes, you can run seff to get some useful information about your job, including the memory used and what percent of your allocated memory that amounts to. [netid@node ~]$ seff 21294645 Job ID: 21294645 Cluster: mccleary User/Group: rdb9/support State: COMPLETED (exit code 0) Cores: 1 CPU Utilized: 00:15:55 CPU Efficiency: 17.04% of 01:33:23 core-walltime Job Wall-clock time: 01:33:23 Memory Utilized: 446.20 MB Memory Efficiency: 8.71% of 5.00 GiB seff-array For job arrays (see here for details) it is helpful to look at statistics for how resources are used by each element of the array. The seff-array tool takes the job ID of the array and then calculates the distribution and average CPU and memory usage: [netid@node ~]$ seff-array 43283382 ========== Max Memory Usage ========== # NumSamples = 90; Min = 896.29 MB; Max = 900.48 MB # Mean = 897.77 MB; Variance = 0.40 MB; SD = 0.63 MB; Median 897.78 MB # each \u220e represents a count of 1 806.6628 - 896.7108 MB [ 2]: \u220e\u220e 896.7108 - 897.1296 MB [ 9]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 897.1296 - 897.5484 MB [ 21]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 897.5484 - 897.9672 MB [ 34]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 897.9672 - 898.3860 MB [ 15]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 898.3860 - 898.8048 MB [ 4]: \u220e\u220e\u220e\u220e 898.8048 - 899.2236 MB [ 1]: \u220e 899.2236 - 899.6424 MB [ 3]: \u220e\u220e\u220e 899.6424 - 900.0612 MB [ 0]: 900.0612 - 900.4800 MB [ 1]: \u220e The requested memory was 2000MB. ========== Elapsed Time ========== # NumSamples = 90; Min = 00:03:25.0; Max = 00:07:24.0 # Mean = 00:05:45.0; SD = 00:01:39.0; Median 00:06:44.0 # each \u220e represents a count of 1 00:03:5.0 - 00:03:48.0 [ 30]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 00:03:48.0 - 00:04:11.0 [ 0]: 00:04:11.0 - 00:04:34.0 [ 0]: 00:04:34.0 - 00:04:57.0 [ 0]: 00:04:57.0 - 00:05:20.0 [ 0]: 00:05:20.0 - 00:05:43.0 [ 0]: 00:05:43.0 - 00:06:6.0 [ 0]: 00:06:6.0 - 00:06:29.0 [ 0]: 00:06:29.0 - 00:06:52.0 [ 30]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 00:06:52.0 - 00:07:15.0 [ 28]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e ******************************************************************************** The requested runtime was 01:00:00. The average runtime was 00:05:45.0. Requesting less time would allow jobs to run more quickly. ******************************************************************************** This shows how efficiently the resource request was for all the jobs in an array. In this example, we see that the average memory usage was just under 1GiB, which is reasonable for the 2GiB requested. However, the requested runtime was for an hour, while the jobs only ran for six minutes. These jobs could have been scheduled more quickly if a more accurate runtime was specified. sacct You can also use the more flexible sacct to get that info, along with other more advanced job queries. Unfortunately, the default output from sacct is not as useful. We recommend setting an environment variable to customize the output. [netid@node ~]$ export SACCT_FORMAT=\"JobID%20,JobName,User,Partition,NodeList,Elapsed,State,ExitCode,MaxRSS,AllocTRES%32\" [netid@node ~]$ sacct -j 21294645 JobID JobName User Partition NodeList Elapsed State ExitCode MaxRSS AllocTRES -------------------- ---------- --------- ---------- --------------- ---------- ---------- -------- ---------- -------------------------------- 21294645 bash rdb9 interacti+ c06n09 01:33:23 COMPLETED 0:0 cpu=1,mem=5G,node=1,billing=1 21294645.extern extern c06n09 01:33:23 COMPLETED 0:0 716K cpu=1,mem=5G,node=1,billing=1 21294645.0 bash c06n09 01:33:23 COMPLETED 0:0 456908K cpu=1,mem=5G,node=1 You should look at the MaxRSS value to see your memory usage.","title":"Monitor CPU and Memory"},{"location":"clusters-at-yale/job-scheduling/resource-usage/#monitor-cpu-and-memory","text":"","title":"Monitor CPU and Memory"},{"location":"clusters-at-yale/job-scheduling/resource-usage/#general-note","text":"Making sure your jobs use the right amount of RAM and the right number of CPUs helps you and others using the clusters use these resources more effeciently, and in turn get work done more quickly. Below are some examples of how to measure your CPU and RAM (aka memory) usage so you can make this happen. Be sure to check the Slurm documentation and the clusters page (especially the partitions and hardware sections) to make sure you are submitting the right jobs to the right hardware.","title":"General Note"},{"location":"clusters-at-yale/job-scheduling/resource-usage/#future-jobs","text":"If you launch a program by putting /usr/bin/time in front of it, time will watch your program and provide statistics about the resources it used. For example: [ netid@node ~ ] $ /usr/bin/time -v stress-ng --cpu 8 --timeout 10s stress-ng: info: [ 32574 ] dispatching hogs: 8 cpu stress-ng: info: [ 32574 ] successful run completed in 10 .08s Command being timed: \"stress-ng --cpu 8 --timeout 10s\" User time ( seconds ) : 80 .22 System time ( seconds ) : 0 .04 Percent of CPU this job got: 795 % Elapsed ( wall clock ) time ( h:mm:ss or m:ss ) : 0 :10.09 Average shared text size ( kbytes ) : 0 Average unshared data size ( kbytes ) : 0 Average stack size ( kbytes ) : 0 Average total size ( kbytes ) : 0 Maximum resident set size ( kbytes ) : 6328 Average resident set size ( kbytes ) : 0 Major ( requiring I/O ) page faults: 0 Minor ( reclaiming a frame ) page faults: 30799 Voluntary context switches: 1380 Involuntary context switches: 68 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 To know how much RAM your job used (and what jobs like it will need in the future), look at the \"Maximum resident set size\"","title":"Future Jobs"},{"location":"clusters-at-yale/job-scheduling/resource-usage/#running-jobs","text":"If your job is already running, you can check on its usage, but will have to wait until it has finished to find the maximum memory and CPU used. The easiest way to check the instantaneous memory and CPU usage of a job is to ssh to a compute node your job is running on. To find the node you should ssh to, run: [netid@node ~]$ squeue --me JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 21252409 general 12345 netid R 32:17 17 c13n[02-04],c14n[05-10],c16n[03-10] Then use ssh to connect to a node your job is running on from the NODELIST column: [netid@node ~]$ ssh c13n03 [netid@c13n03 ~]$ Once you are on the compute node, run either ps or top .","title":"Running Jobs"},{"location":"clusters-at-yale/job-scheduling/resource-usage/#ps","text":"ps will give you instantaneous usage every time you run it. Here is some sample ps output: [netid@bigmem01 ~]$ ps -u$USER -o %cpu,rss,args %CPU RSS COMMAND 92.6 79446140 /gpfs/ysm/apps/hpc/Apps/Matlab/R2016b/bin/glnxa64/MATLAB -dmlworker -nodisplay -r distcomp_evaluate_filetask 94.5 80758040 /gpfs/ysm/apps/hpc/Apps/Matlab/R2016b/bin/glnxa64/MATLAB -dmlworker -nodisplay -r distcomp_evaluate_filetask 92.6 79676460 /gpfs/ysm/apps/hpc/Apps/Matlab/R2016b/bin/glnxa64/MATLAB -dmlworker -nodisplay -r distcomp_evaluate_filetask 92.5 81243364 /gpfs/ysm/apps/hpc/Apps/Matlab/R2016b/bin/glnxa64/MATLAB -dmlworker -nodisplay -r distcomp_evaluate_filetask 93.8 80799668 /gpfs/ysm/apps/hpc/Apps/Matlab/R2016b/bin/glnxa64/MATLAB -dmlworker -nodisplay -r distcomp_evaluate_filetask ps reports memory used in kilobytes, so each of the 5 matlab processes is using ~77GiB of RAM. They are also using most of 5 cores, so future jobs like this should request 5 CPUs.","title":"ps"},{"location":"clusters-at-yale/job-scheduling/resource-usage/#top","text":"top runs interactively and shows you live usage statistics. You can press u , enter your netid, then enter to filter just your processes. For Memory usage, the number you are interested in is RES. In the case below, the YEPNEE.exe programs are each consuming ~600MB of memory and each fully utilizing one CPU. You can press ? for help and q to quit.","title":"top"},{"location":"clusters-at-yale/job-scheduling/resource-usage/#clustershell","text":"For multi-node jobs clush can be very useful. Please see our guide on how to set up and use ClusterShell .","title":"ClusterShell"},{"location":"clusters-at-yale/job-scheduling/resource-usage/#completed-jobs","text":"Slurm records statistics for every job, including how much memory and CPU was used.","title":"Completed Jobs"},{"location":"clusters-at-yale/job-scheduling/resource-usage/#seff","text":"After the job completes, you can run seff to get some useful information about your job, including the memory used and what percent of your allocated memory that amounts to. [netid@node ~]$ seff 21294645 Job ID: 21294645 Cluster: mccleary User/Group: rdb9/support State: COMPLETED (exit code 0) Cores: 1 CPU Utilized: 00:15:55 CPU Efficiency: 17.04% of 01:33:23 core-walltime Job Wall-clock time: 01:33:23 Memory Utilized: 446.20 MB Memory Efficiency: 8.71% of 5.00 GiB","title":"seff"},{"location":"clusters-at-yale/job-scheduling/resource-usage/#seff-array","text":"For job arrays (see here for details) it is helpful to look at statistics for how resources are used by each element of the array. The seff-array tool takes the job ID of the array and then calculates the distribution and average CPU and memory usage: [netid@node ~]$ seff-array 43283382 ========== Max Memory Usage ========== # NumSamples = 90; Min = 896.29 MB; Max = 900.48 MB # Mean = 897.77 MB; Variance = 0.40 MB; SD = 0.63 MB; Median 897.78 MB # each \u220e represents a count of 1 806.6628 - 896.7108 MB [ 2]: \u220e\u220e 896.7108 - 897.1296 MB [ 9]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 897.1296 - 897.5484 MB [ 21]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 897.5484 - 897.9672 MB [ 34]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 897.9672 - 898.3860 MB [ 15]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 898.3860 - 898.8048 MB [ 4]: \u220e\u220e\u220e\u220e 898.8048 - 899.2236 MB [ 1]: \u220e 899.2236 - 899.6424 MB [ 3]: \u220e\u220e\u220e 899.6424 - 900.0612 MB [ 0]: 900.0612 - 900.4800 MB [ 1]: \u220e The requested memory was 2000MB. ========== Elapsed Time ========== # NumSamples = 90; Min = 00:03:25.0; Max = 00:07:24.0 # Mean = 00:05:45.0; SD = 00:01:39.0; Median 00:06:44.0 # each \u220e represents a count of 1 00:03:5.0 - 00:03:48.0 [ 30]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 00:03:48.0 - 00:04:11.0 [ 0]: 00:04:11.0 - 00:04:34.0 [ 0]: 00:04:34.0 - 00:04:57.0 [ 0]: 00:04:57.0 - 00:05:20.0 [ 0]: 00:05:20.0 - 00:05:43.0 [ 0]: 00:05:43.0 - 00:06:6.0 [ 0]: 00:06:6.0 - 00:06:29.0 [ 0]: 00:06:29.0 - 00:06:52.0 [ 30]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e 00:06:52.0 - 00:07:15.0 [ 28]: \u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e\u220e ******************************************************************************** The requested runtime was 01:00:00. The average runtime was 00:05:45.0. Requesting less time would allow jobs to run more quickly. ******************************************************************************** This shows how efficiently the resource request was for all the jobs in an array. In this example, we see that the average memory usage was just under 1GiB, which is reasonable for the 2GiB requested. However, the requested runtime was for an hour, while the jobs only ran for six minutes. These jobs could have been scheduled more quickly if a more accurate runtime was specified.","title":"seff-array"},{"location":"clusters-at-yale/job-scheduling/resource-usage/#sacct","text":"You can also use the more flexible sacct to get that info, along with other more advanced job queries. Unfortunately, the default output from sacct is not as useful. We recommend setting an environment variable to customize the output. [netid@node ~]$ export SACCT_FORMAT=\"JobID%20,JobName,User,Partition,NodeList,Elapsed,State,ExitCode,MaxRSS,AllocTRES%32\" [netid@node ~]$ sacct -j 21294645 JobID JobName User Partition NodeList Elapsed State ExitCode MaxRSS AllocTRES -------------------- ---------- --------- ---------- --------------- ---------- ---------- -------- ---------- -------------------------------- 21294645 bash rdb9 interacti+ c06n09 01:33:23 COMPLETED 0:0 cpu=1,mem=5G,node=1,billing=1 21294645.extern extern c06n09 01:33:23 COMPLETED 0:0 716K cpu=1,mem=5G,node=1,billing=1 21294645.0 bash c06n09 01:33:23 COMPLETED 0:0 456908K cpu=1,mem=5G,node=1 You should look at the MaxRSS value to see your memory usage.","title":"sacct"},{"location":"clusters-at-yale/job-scheduling/scavenge/","text":"Scavenge Partition A scavenge partition is available on all of our clusters. It allows you to (a) run jobs outside of your normal limits (e.g. QOSMaxCpuPerUserLimit ) and (b) use unutilized cores, if available, in any private partition on the cluster. You can also use the scavenge partition to get access to unused cores in special purpose partitions, such as the \"gpu\" or \"mpi\" partitions, and unused GPUs in private partitions. However, any job running in the scavenge partition is subject to preemption if any node in use by the job is required for a job in the node's normal partition. This means that your job may be killed without advance notice, so you should only run jobs in the scavenge partition that either have checkpoint capabilities or that can otherwise be restarted with minimal loss of progress. Warning Not all jobs are a good fit for the scavenge partition, such as jobs with long startup times or jobs that run a long time between checkpoint operations. Automatically Requeue Preempted Jobs If you would like your job to be automatically added back to the queue if preempted, you can add the --requeue flag to your submission script. #SBATCH --requeue Be aware that your job, when started from a requeue, will still re-run the entire original submission script. It will only resume progress if your program has the its own ability to checkpoint and restart from previous progress. Track History of a Requeued Job When a scavenge job is requeued after preemption, it retains the same job id. However, this can make it difficult to track the history of the job (how many times it was requeued, how long it ran for each time). To view the full history of your job use the --duplicates flag for the sacct command. sacct -j --duplicates Scavenge GPUs On Grace and McCleary, we also have a scavenge_gpu partition, that contains all scavenge-able GPU enabled nodes and has higher priority for those node than normal scavenge. In all other ways (e.g. preemption, time limit), scavenge_gpu behaves the same as the normal scavenge partition. You can see the full count of GPU nodes in the Partition tables on the respective cluster pages. Scavenge MPI Nodes On Grace, we have a scavenge_mpi partition, that contains all scavenge-able nodes similar to the mpi partition and has higher priority for those node than normal scavenge. scavenge_mpi is subject to the same preemption model as scavenge and the same use case restrictions as the regular mpi partition (multi-node, tightly couple parallel codes). You can see the full count of MPI nodes in the Partition tables on the respective cluster pages. Research Available Nodes If you are interested in specific hardware and its availability, you can use the sinfo command to query how many of each type of node is available and what features it lists. For example: sinfo -e -o \"%.6D|%c|%G|%b\" | column -ts \"|\" will show you the kinds of nodes available, and sinfo -e -o \"%.6D|%T|%c|%G|%b\" | column -ts \"|\" will break out how many nodes in each state (e.g. allocated, mixed, idle) there are. For more options see the official sinfo documentation .","title":"Scavenge Partition"},{"location":"clusters-at-yale/job-scheduling/scavenge/#scavenge-partition","text":"A scavenge partition is available on all of our clusters. It allows you to (a) run jobs outside of your normal limits (e.g. QOSMaxCpuPerUserLimit ) and (b) use unutilized cores, if available, in any private partition on the cluster. You can also use the scavenge partition to get access to unused cores in special purpose partitions, such as the \"gpu\" or \"mpi\" partitions, and unused GPUs in private partitions. However, any job running in the scavenge partition is subject to preemption if any node in use by the job is required for a job in the node's normal partition. This means that your job may be killed without advance notice, so you should only run jobs in the scavenge partition that either have checkpoint capabilities or that can otherwise be restarted with minimal loss of progress. Warning Not all jobs are a good fit for the scavenge partition, such as jobs with long startup times or jobs that run a long time between checkpoint operations.","title":"Scavenge Partition"},{"location":"clusters-at-yale/job-scheduling/scavenge/#automatically-requeue-preempted-jobs","text":"If you would like your job to be automatically added back to the queue if preempted, you can add the --requeue flag to your submission script. #SBATCH --requeue Be aware that your job, when started from a requeue, will still re-run the entire original submission script. It will only resume progress if your program has the its own ability to checkpoint and restart from previous progress.","title":"Automatically Requeue Preempted Jobs"},{"location":"clusters-at-yale/job-scheduling/scavenge/#track-history-of-a-requeued-job","text":"When a scavenge job is requeued after preemption, it retains the same job id. However, this can make it difficult to track the history of the job (how many times it was requeued, how long it ran for each time). To view the full history of your job use the --duplicates flag for the sacct command. sacct -j --duplicates","title":"Track History of a Requeued Job"},{"location":"clusters-at-yale/job-scheduling/scavenge/#scavenge-gpus","text":"On Grace and McCleary, we also have a scavenge_gpu partition, that contains all scavenge-able GPU enabled nodes and has higher priority for those node than normal scavenge. In all other ways (e.g. preemption, time limit), scavenge_gpu behaves the same as the normal scavenge partition. You can see the full count of GPU nodes in the Partition tables on the respective cluster pages.","title":"Scavenge GPUs"},{"location":"clusters-at-yale/job-scheduling/scavenge/#scavenge-mpi-nodes","text":"On Grace, we have a scavenge_mpi partition, that contains all scavenge-able nodes similar to the mpi partition and has higher priority for those node than normal scavenge. scavenge_mpi is subject to the same preemption model as scavenge and the same use case restrictions as the regular mpi partition (multi-node, tightly couple parallel codes). You can see the full count of MPI nodes in the Partition tables on the respective cluster pages.","title":"Scavenge MPI Nodes"},{"location":"clusters-at-yale/job-scheduling/scavenge/#research-available-nodes","text":"If you are interested in specific hardware and its availability, you can use the sinfo command to query how many of each type of node is available and what features it lists. For example: sinfo -e -o \"%.6D|%c|%G|%b\" | column -ts \"|\" will show you the kinds of nodes available, and sinfo -e -o \"%.6D|%T|%c|%G|%b\" | column -ts \"|\" will break out how many nodes in each state (e.g. allocated, mixed, idle) there are. For more options see the official sinfo documentation .","title":"Research Available Nodes"},{"location":"clusters-at-yale/job-scheduling/scrontab/","text":"Recurring Jobs You can use scrontab to schedule recurring jobs. It uses a syntax similar to crontab , a standard Unix/Linux utility for running programs at specified intervals. scrontab vs crontab If you are familiar with crontab , there are some important differences to note: The scheduled times for scrontab indicate when your job is eligible to start. They are not start times like a traditional Cron jobs. Jobs managed with scrontab won't start if an earlier iteration of the same job is still running. Cron will happily run multiple copies of a job at the same time. You have one scrontab file for the entire cluster, unlike crontabs which are stored locally on each computer. Set Up Your scrontab Edit Your scrontab Run scrontab -e to edit your scrontab file. If you prefer to use nano to edit files, run EDITOR = nano scrontab -e Lines that start with #SCRON are treated like the beginning of a new batch job, and work like #SBATCH directives for batch jobs. Slurm will ignore #SBATCH directives in scripts you run as scrontab jobs. You can use most common sbatch options just as you would using sbatch on the command line . The first line after your SCRON directives specifies the schedule for your job and the command to run. Note All of your scrontab jobs will start with your home directory as the working directory. You can change this with the --chdir slurm option. Cron syntax Crontab syntax is specified in five columns, to specify minutes, hours, days of the month, months, and days of the week. Especially at first you may find it easiest to use a helper application to generate your cron date fields, such as crontab-generator or cronhub.io . You can also use the short-hand syntax @hourly , @daily , @weekly , @monthly , and @yearly instead of the five separate columns. What to Run If you're running a script it must be marked as executable. Jobs handled by scrontab do not run in a full login shell, so if you have customized your .bashrc file you need to add: source ~/.bashrc To your script to ensure that your environment is set up correctly. Note The command you specify in the scrontab is executed via bash, NOT sbatch. You can list multiple commands separated by ;, and use other shell features, such as redirects. Also, any #SBATCH directives in executed scripts will be ignored. You must use #SCRON in the scrontab file instead. Note Your scrontab jobs will appear to have the same JobID every time they run until the next time you edit your scrontab file (they are being requeued). This means that only the most recent job will be logged to the default output file. If you want deeper history, you should redirect output in your scripts to filenames with something more unique in their names, like a date or timestamp, e.g. python my_script.py > $( date + \"%Y-%m-%d\" ) _myjob_scrontab.out If you want to see slurm accounting of a job handled by scrontab, for example job 12345 run: sacct --duplicates --jobs 12345 # or with short options sacct -Dj 12345 Examples Run a Daily Simulation This example submits a 6-hour simulation eligible to start every day at 12:00 AM. #SCRON --time 6:00:00 #SCRON --cpus-per-task 4 #SCRON --name \"daily_sim\" #SCRON --chdir /home/netid/project #SCRON -o my_simulations/%j-out.txt @daily ./simulation_v2_final.sh Run a Weekly Transfer Job This example submits a transfer script eligible to start every Wednesday at 8:00 PM. #SCRON --time 1:00:00 #SCRON --partition transfer #SCRON --chdir /home/netid/project/to_transfer #SCRON -o transfer_log_%j.txt 0 20 * * 3 ./rclone_commands.sh Capture output from each run in a separate file Normally scrontab will clobber the output file from the previous run on each execution, since each execution uses the same jobid. This can be avoided using a redirect to a date-stamped file. 0 20 * * 3 ./commands.sh > myjob_ $( date +%Y%m%d%H%M ) .out","title":"Recurring Jobs"},{"location":"clusters-at-yale/job-scheduling/scrontab/#recurring-jobs","text":"You can use scrontab to schedule recurring jobs. It uses a syntax similar to crontab , a standard Unix/Linux utility for running programs at specified intervals. scrontab vs crontab If you are familiar with crontab , there are some important differences to note: The scheduled times for scrontab indicate when your job is eligible to start. They are not start times like a traditional Cron jobs. Jobs managed with scrontab won't start if an earlier iteration of the same job is still running. Cron will happily run multiple copies of a job at the same time. You have one scrontab file for the entire cluster, unlike crontabs which are stored locally on each computer.","title":"Recurring Jobs"},{"location":"clusters-at-yale/job-scheduling/scrontab/#set-up-your-scrontab","text":"","title":"Set Up Your scrontab"},{"location":"clusters-at-yale/job-scheduling/scrontab/#edit-your-scrontab","text":"Run scrontab -e to edit your scrontab file. If you prefer to use nano to edit files, run EDITOR = nano scrontab -e Lines that start with #SCRON are treated like the beginning of a new batch job, and work like #SBATCH directives for batch jobs. Slurm will ignore #SBATCH directives in scripts you run as scrontab jobs. You can use most common sbatch options just as you would using sbatch on the command line . The first line after your SCRON directives specifies the schedule for your job and the command to run. Note All of your scrontab jobs will start with your home directory as the working directory. You can change this with the --chdir slurm option.","title":"Edit Your scrontab"},{"location":"clusters-at-yale/job-scheduling/scrontab/#cron-syntax","text":"Crontab syntax is specified in five columns, to specify minutes, hours, days of the month, months, and days of the week. Especially at first you may find it easiest to use a helper application to generate your cron date fields, such as crontab-generator or cronhub.io . You can also use the short-hand syntax @hourly , @daily , @weekly , @monthly , and @yearly instead of the five separate columns.","title":"Cron syntax"},{"location":"clusters-at-yale/job-scheduling/scrontab/#what-to-run","text":"If you're running a script it must be marked as executable. Jobs handled by scrontab do not run in a full login shell, so if you have customized your .bashrc file you need to add: source ~/.bashrc To your script to ensure that your environment is set up correctly. Note The command you specify in the scrontab is executed via bash, NOT sbatch. You can list multiple commands separated by ;, and use other shell features, such as redirects. Also, any #SBATCH directives in executed scripts will be ignored. You must use #SCRON in the scrontab file instead. Note Your scrontab jobs will appear to have the same JobID every time they run until the next time you edit your scrontab file (they are being requeued). This means that only the most recent job will be logged to the default output file. If you want deeper history, you should redirect output in your scripts to filenames with something more unique in their names, like a date or timestamp, e.g. python my_script.py > $( date + \"%Y-%m-%d\" ) _myjob_scrontab.out If you want to see slurm accounting of a job handled by scrontab, for example job 12345 run: sacct --duplicates --jobs 12345 # or with short options sacct -Dj 12345","title":"What to Run"},{"location":"clusters-at-yale/job-scheduling/scrontab/#examples","text":"","title":"Examples"},{"location":"clusters-at-yale/job-scheduling/scrontab/#run-a-daily-simulation","text":"This example submits a 6-hour simulation eligible to start every day at 12:00 AM. #SCRON --time 6:00:00 #SCRON --cpus-per-task 4 #SCRON --name \"daily_sim\" #SCRON --chdir /home/netid/project #SCRON -o my_simulations/%j-out.txt @daily ./simulation_v2_final.sh","title":"Run a Daily Simulation"},{"location":"clusters-at-yale/job-scheduling/scrontab/#run-a-weekly-transfer-job","text":"This example submits a transfer script eligible to start every Wednesday at 8:00 PM. #SCRON --time 1:00:00 #SCRON --partition transfer #SCRON --chdir /home/netid/project/to_transfer #SCRON -o transfer_log_%j.txt 0 20 * * 3 ./rclone_commands.sh","title":"Run a Weekly Transfer Job"},{"location":"clusters-at-yale/job-scheduling/scrontab/#capture-output-from-each-run-in-a-separate-file","text":"Normally scrontab will clobber the output file from the previous run on each execution, since each execution uses the same jobid. This can be avoided using a redirect to a date-stamped file. 0 20 * * 3 ./commands.sh > myjob_ $( date +%Y%m%d%H%M ) .out","title":"Capture output from each run in a separate file"},{"location":"clusters-at-yale/job-scheduling/simplequeue/","text":"SimpleQueue SimpleQueue is a tool written here to streamline submission of a large number of jobs using a task file. It has a number of advantages: You can run more of your sequential jobs concurrently, since there is a limit on the number of individual qsubs you can run simultaneously. You only have one job to keep track of. If you need to shut everything down, you only need to kill one job. SimpleQueue keeps track of the status of individual jobs. Note that version 3.0+ of SimpleQueue differs from earlier versions in important ways, in particular the meaning of -n. If you have been using an earlier version, please read the following carefully! SimpleQueue is available as a module on our clusters. Run: module avail simplequeue to locate the simplequeue module on your cluster of choice. Example SimpleQueue Job For example, imagine that you have 1000 fastq files that correspond to individual samples you want to map to a genome with bowtie2 and convert to bam files with samtools . Given some initial testing, you think that 80 cpus working together will be enough to finish the job in a reasonable time. Step 1: Create Task List The first step is to create a file with a list of the \"tasks\" you want to run. Each task corresponds to what you might otherwise have run as a single job. A task can be a simple command invocation, or a sequence of commands. You can call the task file anything, but for this example assume it's called \"tasklist.txt\" and contains: module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample1 --rg SM:sample1 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample1.fastq - | samtools view -Shu - | samtools sort - sample1 module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample2 --rg SM:sample2 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample2.fastq - | samtools view -Shu - | samtools sort - sample2 ... module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample1000 --rg SM:sample1000 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample1000.fastq - | samtools view -Shu - | samtools sort - sample1000 For simplicity, we'll assume that tasklist, input fastq files, and indexed genome are in a directory called ~/genome_proj/mapping . Step 2: Create Submission Script Load the SimpleQueue module, then create the launch script using: sqCreateScript -q general -N genome_map -n 80 tasklist.txt > run.sh These parameters specify that the job, named genome_map, will be submitted to the general queue/partition. This job will find 80 free cores, start 80 workers on them, and begin processing tasks from the taskfile tasklist.txt . sqCreateScript takes a number of options. They differ somewhat from cluster to cluster, particularly the default values for queue, walltime, and memory. You can run sqCreateScript without any arguments to see the exact options on your cluster. Usage: -h, --help show this help message and exit -n WORKERS, --workers=WORKERS Number of workers to use. Not required. Defaults to 1. -c CORES, --cores=CORES Number of cores to request per worker. Defaults to 1. -m MEM, --mem=MEM Memory per worker. Not required. Defaults to 1G -w WALLTIME, --walltime=WALLTIME Walltime to request for the Slurm Job in form [[D-]HH:]MM:SS. Not required. Defaults to 1:00:00. -q QUEUE, --queue=QUEUE Name of queue to use. Not required. Defaults to general -N NAME, --name=NAME Base job name to use. Not required. Defaults to SimpleQueue. --logdir=LOGDIR Name of logging directory. Defaults to SQ_Files_${SLURM_JOB_ID}. Step 3: Submit Your Job Now you can simply submit run.sh to the scheduler. All of the important scheduler options (queue, number of tasks, number of cpus per task) will have been set in the script so you needn't worry about them. Shortly after run.sh begins running, you should see a directory appear called SQ_Files_jobid where jobid is the jobid the scheduler assigned your job. This directory contains logs from all the tasks that are run during your job. In addition, there are a few other files that record information about the job as a whole. Of these, the most important one is SQ.log . It should be reviewed if you encounter a problem with a run. Assuming that all goes well, tasks from the tasklist file will be scheduled automatically onto the cpus you acquired until all the tasks have completed. At that time, the job will terminate, and you'll see several summary files: scheduler_jobid_out.txt : this is the stdout from simple queue proper (it is generally empty). scheduler_jobid_err.txt : this is the stderr from simple queue proper (it is generally a copy of SQ.log ). tasklist.txt.STATUS : this contains a list of all the tasks that were run, including exit status, start time, end time, pid, node run on, and the command run. tasklist.txt.REMAINING : Failed or uncompleted tasks will be listed in this file in the same format as tasklist, so that those tasks can be easily rerun. You should review the status files related to these tasks to understand why they did not complete. This list is provided for convenience. It is always a good idea to scan tasklist.STATUS to double check which tasks did in fact complete with a normal exit status. tasklist.txt.ROGUES : The simple queue system attempts to ensure that all tasks launched eventually exit (normally or abnormally). If it fails to get confirmation that a task has exited, information about the command will be written to this file. This information can be used to hunt down and kill run away processes. Other Important Options If your individual tasks need more than the default memory allocated on your cluster, you can specify a different value using -m. For example: sqCreateScript -m 10g -n 4 ... tasklist > run.sh would request 10GiB of RAM for each of your workers. If your jobs are themselves multithreaded, you can request that your workers have multiple cores using the -c option: sqCreateScript -c 20 -n 4 ... tasklist > run.sh This would create 4 workers, each having access to 20 cores.","title":"SimpleQueue"},{"location":"clusters-at-yale/job-scheduling/simplequeue/#simplequeue","text":"SimpleQueue is a tool written here to streamline submission of a large number of jobs using a task file. It has a number of advantages: You can run more of your sequential jobs concurrently, since there is a limit on the number of individual qsubs you can run simultaneously. You only have one job to keep track of. If you need to shut everything down, you only need to kill one job. SimpleQueue keeps track of the status of individual jobs. Note that version 3.0+ of SimpleQueue differs from earlier versions in important ways, in particular the meaning of -n. If you have been using an earlier version, please read the following carefully! SimpleQueue is available as a module on our clusters. Run: module avail simplequeue to locate the simplequeue module on your cluster of choice.","title":"SimpleQueue"},{"location":"clusters-at-yale/job-scheduling/simplequeue/#example-simplequeue-job","text":"For example, imagine that you have 1000 fastq files that correspond to individual samples you want to map to a genome with bowtie2 and convert to bam files with samtools . Given some initial testing, you think that 80 cpus working together will be enough to finish the job in a reasonable time.","title":"Example SimpleQueue Job"},{"location":"clusters-at-yale/job-scheduling/simplequeue/#step-1-create-task-list","text":"The first step is to create a file with a list of the \"tasks\" you want to run. Each task corresponds to what you might otherwise have run as a single job. A task can be a simple command invocation, or a sequence of commands. You can call the task file anything, but for this example assume it's called \"tasklist.txt\" and contains: module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample1 --rg SM:sample1 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample1.fastq - | samtools view -Shu - | samtools sort - sample1 module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample2 --rg SM:sample2 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample2.fastq - | samtools view -Shu - | samtools sort - sample2 ... module load bowtie2 samtools ; bowtie2 -p 8 --local --rg-id sample1000 --rg SM:sample1000 --rg LB:sci_seq --rg PL:ILLUMINA -x my_genome -U sample1000.fastq - | samtools view -Shu - | samtools sort - sample1000 For simplicity, we'll assume that tasklist, input fastq files, and indexed genome are in a directory called ~/genome_proj/mapping .","title":"Step 1: Create Task List"},{"location":"clusters-at-yale/job-scheduling/simplequeue/#step-2-create-submission-script","text":"Load the SimpleQueue module, then create the launch script using: sqCreateScript -q general -N genome_map -n 80 tasklist.txt > run.sh These parameters specify that the job, named genome_map, will be submitted to the general queue/partition. This job will find 80 free cores, start 80 workers on them, and begin processing tasks from the taskfile tasklist.txt . sqCreateScript takes a number of options. They differ somewhat from cluster to cluster, particularly the default values for queue, walltime, and memory. You can run sqCreateScript without any arguments to see the exact options on your cluster. Usage: -h, --help show this help message and exit -n WORKERS, --workers=WORKERS Number of workers to use. Not required. Defaults to 1. -c CORES, --cores=CORES Number of cores to request per worker. Defaults to 1. -m MEM, --mem=MEM Memory per worker. Not required. Defaults to 1G -w WALLTIME, --walltime=WALLTIME Walltime to request for the Slurm Job in form [[D-]HH:]MM:SS. Not required. Defaults to 1:00:00. -q QUEUE, --queue=QUEUE Name of queue to use. Not required. Defaults to general -N NAME, --name=NAME Base job name to use. Not required. Defaults to SimpleQueue. --logdir=LOGDIR Name of logging directory. Defaults to SQ_Files_${SLURM_JOB_ID}.","title":"Step 2: Create Submission Script"},{"location":"clusters-at-yale/job-scheduling/simplequeue/#step-3-submit-your-job","text":"Now you can simply submit run.sh to the scheduler. All of the important scheduler options (queue, number of tasks, number of cpus per task) will have been set in the script so you needn't worry about them. Shortly after run.sh begins running, you should see a directory appear called SQ_Files_jobid where jobid is the jobid the scheduler assigned your job. This directory contains logs from all the tasks that are run during your job. In addition, there are a few other files that record information about the job as a whole. Of these, the most important one is SQ.log . It should be reviewed if you encounter a problem with a run. Assuming that all goes well, tasks from the tasklist file will be scheduled automatically onto the cpus you acquired until all the tasks have completed. At that time, the job will terminate, and you'll see several summary files: scheduler_jobid_out.txt : this is the stdout from simple queue proper (it is generally empty). scheduler_jobid_err.txt : this is the stderr from simple queue proper (it is generally a copy of SQ.log ). tasklist.txt.STATUS : this contains a list of all the tasks that were run, including exit status, start time, end time, pid, node run on, and the command run. tasklist.txt.REMAINING : Failed or uncompleted tasks will be listed in this file in the same format as tasklist, so that those tasks can be easily rerun. You should review the status files related to these tasks to understand why they did not complete. This list is provided for convenience. It is always a good idea to scan tasklist.STATUS to double check which tasks did in fact complete with a normal exit status. tasklist.txt.ROGUES : The simple queue system attempts to ensure that all tasks launched eventually exit (normally or abnormally). If it fails to get confirmation that a task has exited, information about the command will be written to this file. This information can be used to hunt down and kill run away processes.","title":"Step 3: Submit Your Job"},{"location":"clusters-at-yale/job-scheduling/simplequeue/#other-important-options","text":"If your individual tasks need more than the default memory allocated on your cluster, you can specify a different value using -m. For example: sqCreateScript -m 10g -n 4 ... tasklist > run.sh would request 10GiB of RAM for each of your workers. If your jobs are themselves multithreaded, you can request that your workers have multiple cores using the -c option: sqCreateScript -c 20 -n 4 ... tasklist > run.sh This would create 4 workers, each having access to 20 cores.","title":"Other Important Options"},{"location":"clusters-at-yale/job-scheduling/slurm-account/","text":"Slurm Account Coordinator On the clusters the YCRC maintains, we map your linux user and group to your Slurm user and account, which is what actually gives you permission to submit to the various partitions available on the clusters. By changing the Slurm accounts associated with your user, you can modify access to partitions. As a coordinator of an account, you have permission to modify users' association with that account and modify jobs running that are associated with that account. Below are some useful example commands where we use an example user with the name \"be59\" where you are the coordinator of the slurm account \"cryoem\". Add/Remove Users From an Account sacctmgr add user be59 account = cryoem # add user sacctmgr remove user where user = be59 and account = cryoem # remove user Show Account Info sacctmgr show assoc user = be59 # show user associations sacctmgr show assoc account = cryoem # show assocations for account Submit Jobs salloc -A cryoem ... sbatch -A cryoem my_script.sh List Jobs squeue -A cryoem # by account squeue -u be59 # by user Cancel Jobs scancel 1234 # by job ID scancel -u be59 # kill all jobs by user scancel -u be59 --state = running # kill running jobs by user scancel -u be59 --state = pending # kill pending jobs by user scancel -A cryoem # kill all jobs in the account Hold and Release Jobs scontrol hold 1234 # by job ID scontrol release 1234 # remove the hold scontrol uhold 1234 # hold job 1234 but allow the job's owner to release it","title":"Slurm Account Coordinator"},{"location":"clusters-at-yale/job-scheduling/slurm-account/#slurm-account-coordinator","text":"On the clusters the YCRC maintains, we map your linux user and group to your Slurm user and account, which is what actually gives you permission to submit to the various partitions available on the clusters. By changing the Slurm accounts associated with your user, you can modify access to partitions. As a coordinator of an account, you have permission to modify users' association with that account and modify jobs running that are associated with that account. Below are some useful example commands where we use an example user with the name \"be59\" where you are the coordinator of the slurm account \"cryoem\".","title":"Slurm Account Coordinator"},{"location":"clusters-at-yale/job-scheduling/slurm-account/#addremove-users-from-an-account","text":"sacctmgr add user be59 account = cryoem # add user sacctmgr remove user where user = be59 and account = cryoem # remove user","title":"Add/Remove Users From an Account"},{"location":"clusters-at-yale/job-scheduling/slurm-account/#show-account-info","text":"sacctmgr show assoc user = be59 # show user associations sacctmgr show assoc account = cryoem # show assocations for account","title":"Show Account Info"},{"location":"clusters-at-yale/job-scheduling/slurm-account/#submit-jobs","text":"salloc -A cryoem ... sbatch -A cryoem my_script.sh","title":"Submit Jobs"},{"location":"clusters-at-yale/job-scheduling/slurm-account/#list-jobs","text":"squeue -A cryoem # by account squeue -u be59 # by user","title":"List Jobs"},{"location":"clusters-at-yale/job-scheduling/slurm-account/#cancel-jobs","text":"scancel 1234 # by job ID scancel -u be59 # kill all jobs by user scancel -u be59 --state = running # kill running jobs by user scancel -u be59 --state = pending # kill pending jobs by user scancel -A cryoem # kill all jobs in the account","title":"Cancel Jobs"},{"location":"clusters-at-yale/job-scheduling/slurm-account/#hold-and-release-jobs","text":"scontrol hold 1234 # by job ID scontrol release 1234 # remove the hold scontrol uhold 1234 # hold job 1234 but allow the job's owner to release it","title":"Hold and Release Jobs"},{"location":"clusters-at-yale/job-scheduling/slurm-examples/","text":"Submission Script Examples In addition to those below, we have additional example submission scripts for Parallel R, Matlab and Python . Single threaded programs (basic) #!/bin/bash #SBATCH --job-name=my_job #SBATCH --time=10:00 ./hello.omp Multi-threaded programs #!/bin/bash #SBATCH --job-name=omp_job #SBATCH --output=omp_job.txt #SBATCH --ntasks=1 #SBATCH --cpus-per-task=4 #SBATCH --time=10:00 export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK ./hello.omp Multi-process programs #!/bin/bash #SBATCH --job-name=mpi #SBATCH --output=mpi_job.txt #SBATCH --ntasks=4 #SBATCH --time=10:00 mpirun hello.mpi Tip On Grace's mpi partition, try to make ntasks equal to a multiple of 24. Hybrid (MPI+OpenMP) programs #!/bin/bash #SBATCH --job-name=hybrid #SBATCH --output=hydrid_job.txt #SBATCH --ntasks=8 #SBATCH --cpus-per-task=5 #SBATCH --nodes=2 #SBATCH --time=10:00 export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK mpirun hello_hybrid.mpi GPU job #!/bin/bash #SBATCH --job-name=deep_learn #SBATCH --output=gpu_job.txt #SBATCH --ntasks=1 #SBATCH --cpus-per-task=2 #SBATCH --gpus=p100:2 #SBATCH --partition=gpu #SBATCH --time=10:00 module load CUDA module load cuDNN # using your anaconda environment source activate deep-learn python my_tensorflow.py","title":"Submission Script Examples"},{"location":"clusters-at-yale/job-scheduling/slurm-examples/#submission-script-examples","text":"In addition to those below, we have additional example submission scripts for Parallel R, Matlab and Python .","title":"Submission Script Examples"},{"location":"clusters-at-yale/job-scheduling/slurm-examples/#single-threaded-programs-basic","text":"#!/bin/bash #SBATCH --job-name=my_job #SBATCH --time=10:00 ./hello.omp","title":"Single threaded programs (basic)"},{"location":"clusters-at-yale/job-scheduling/slurm-examples/#multi-threaded-programs","text":"#!/bin/bash #SBATCH --job-name=omp_job #SBATCH --output=omp_job.txt #SBATCH --ntasks=1 #SBATCH --cpus-per-task=4 #SBATCH --time=10:00 export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK ./hello.omp","title":"Multi-threaded programs"},{"location":"clusters-at-yale/job-scheduling/slurm-examples/#multi-process-programs","text":"#!/bin/bash #SBATCH --job-name=mpi #SBATCH --output=mpi_job.txt #SBATCH --ntasks=4 #SBATCH --time=10:00 mpirun hello.mpi Tip On Grace's mpi partition, try to make ntasks equal to a multiple of 24.","title":"Multi-process programs"},{"location":"clusters-at-yale/job-scheduling/slurm-examples/#hybrid-mpiopenmp-programs","text":"#!/bin/bash #SBATCH --job-name=hybrid #SBATCH --output=hydrid_job.txt #SBATCH --ntasks=8 #SBATCH --cpus-per-task=5 #SBATCH --nodes=2 #SBATCH --time=10:00 export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK mpirun hello_hybrid.mpi","title":"Hybrid (MPI+OpenMP) programs"},{"location":"clusters-at-yale/job-scheduling/slurm-examples/#gpu-job","text":"#!/bin/bash #SBATCH --job-name=deep_learn #SBATCH --output=gpu_job.txt #SBATCH --ntasks=1 #SBATCH --cpus-per-task=2 #SBATCH --gpus=p100:2 #SBATCH --partition=gpu #SBATCH --time=10:00 module load CUDA module load cuDNN # using your anaconda environment source activate deep-learn python my_tensorflow.py","title":"GPU job"},{"location":"data/","text":"Data Storage Below we highlight some data storage option at Yale that are appropriate for research data. For a more complete list of data storage options, see the Storage Finder . If you have questions about selecting an appropriate home for your data, contact us for assistance. HPC Cluster Storage Capacity: Varies. Cost: Varies Sensitive data is only allowed on the Milgram cluster Only available on YCRC HPC clusters Along with access to the compute clusters we provide each research group with cluster storage space for research data. The storage is separated into three quotas: Home, Project, and 60-day Scratch. Each of these quotas limit both the amount in bytes and number of files you can store. Details can be found on our Cluster Storage page. Additional project-style storage allocations can be purchased. See here for more information. Google Drive via EliApps Warning Changes to Google Drive pricing ITS has informed us of a number of changes to the EliApps Google Drive quotas, including shared drives. As of 8/15/23, all new EliApps accounts will have a free quota of 5GB. As of 7/1/24, all existing EliApps accounts will have a free quota of 5GB. Quotas beyond 5GB will be available for $145/TB/yr Therefore, you should probably not consider Google Drive on EliApps for storage large amounts of data. ITS suggested alternatives are Storage@Yale, Teams/SharePoint, or DropBox. Capacity: 400,000 file count quota, 5TiB max file size. Cost: Free No sensitive data (e.g. ePHI, HIPAA) Can be mounted on your local machine and transferred to via Globus Google Drive Connector Google Drive is a cloud service for file storage, document editing and sharing. All members of the Yale community with an EliApps (Google Workspace for Education) account have storage at no cost in the associated Google Drive account. Moreover, EliApps users can request Shared Drives, which are shared spaces where all files are group-owned. For more information on Google Drive through EliApps, see our Google Drive documentation . Storage @ Yale Capacity: As requested. Cost: See below No sensitive data (e.g. ePHI, HIPAA) for cluster mounts Can be mounted on the cluster or computers on campus (but not both) Storage @ Yale (S@Y) is a central storage service provided by ITS. S@Y shares can either be accessible on campus computers or the clusters, but not both. Type Use Object Tier Good for staging data between cloud and clusters Active Tier Daily use, still copy to cluster before using in jobs Archive Tier Long term storage, low access. Make sure to properly archive Backup Tier Low-access remote object backup. Make sure to properly archive For pricing information, see the ITS Data Rates . All prices are charged monthly for storage used at that time. To request a share, press the \u201cRequest this Service\u201d button in the right sidebar on the Storage@Yale website . If you would like to request a share that is mounted on the clusters, specify in your request that the share be mounted from the HPC clusters . If you elect to use archive tier storage, be cognizant of its performance characteristics . Cluster I/O Performance Since cluster-mounted S@Y shares do not provide sufficient performance for use in jobs, they are not mounted on our compute or login nodes. To access S@Y on the clusters, connect to one of the transfer nodes to stage the data to Project or Scratch60 before running jobs. Microsoft Teams/SharePoint Capacity: 25 TB, 250 GB per file. Cost: Free You can request a Team and 25TiB of underlying SharePoint storage space from ITS Email And Collaboration Services . For more information on The relationship between Teams, SharePoint, and OneDrive, see the official Microsoft post on the subject . Dropbox at Yale ITS offers departmental subscriptions to DropBox for a low cost (currently $23.66/user/year). Unlimited storage (take this with a grain of salt) Low risk data only For more information about DropBox at Yale, see the ITS website. Box at Yale Capacity: 50GiB per user. Cost: Free. 15 GiB max file size. Sensitive data (e.g. ePHI, HIPAA) only in Secure Box Can be mounted on your local machine and transferred with rclone All members of the Yale community have access to a share at Box at Yale. Box is another cloud-based file sharing and storage service. You can upload and access your data using the web portal and sync data with your local machines via Box Sync. To access, navigate to yale.box.com and login with your yale.edu account. For sync with your local machine, install Box Sync and authenticate with your yale.edu account. For more information about Box at Yale, see the ITS website. To learn more about these options, see the Yale Collaboration Counts page available through Yale ITS for details.","title":"Data Storage"},{"location":"data/#data-storage","text":"Below we highlight some data storage option at Yale that are appropriate for research data. For a more complete list of data storage options, see the Storage Finder . If you have questions about selecting an appropriate home for your data, contact us for assistance.","title":"Data Storage"},{"location":"data/#hpc-cluster-storage","text":"Capacity: Varies. Cost: Varies Sensitive data is only allowed on the Milgram cluster Only available on YCRC HPC clusters Along with access to the compute clusters we provide each research group with cluster storage space for research data. The storage is separated into three quotas: Home, Project, and 60-day Scratch. Each of these quotas limit both the amount in bytes and number of files you can store. Details can be found on our Cluster Storage page. Additional project-style storage allocations can be purchased. See here for more information.","title":"HPC Cluster Storage"},{"location":"data/#google-drive-via-eliapps","text":"Warning Changes to Google Drive pricing ITS has informed us of a number of changes to the EliApps Google Drive quotas, including shared drives. As of 8/15/23, all new EliApps accounts will have a free quota of 5GB. As of 7/1/24, all existing EliApps accounts will have a free quota of 5GB. Quotas beyond 5GB will be available for $145/TB/yr Therefore, you should probably not consider Google Drive on EliApps for storage large amounts of data. ITS suggested alternatives are Storage@Yale, Teams/SharePoint, or DropBox. Capacity: 400,000 file count quota, 5TiB max file size. Cost: Free No sensitive data (e.g. ePHI, HIPAA) Can be mounted on your local machine and transferred to via Globus Google Drive Connector Google Drive is a cloud service for file storage, document editing and sharing. All members of the Yale community with an EliApps (Google Workspace for Education) account have storage at no cost in the associated Google Drive account. Moreover, EliApps users can request Shared Drives, which are shared spaces where all files are group-owned. For more information on Google Drive through EliApps, see our Google Drive documentation .","title":"Google Drive via EliApps"},{"location":"data/#storage-yale","text":"Capacity: As requested. Cost: See below No sensitive data (e.g. ePHI, HIPAA) for cluster mounts Can be mounted on the cluster or computers on campus (but not both) Storage @ Yale (S@Y) is a central storage service provided by ITS. S@Y shares can either be accessible on campus computers or the clusters, but not both. Type Use Object Tier Good for staging data between cloud and clusters Active Tier Daily use, still copy to cluster before using in jobs Archive Tier Long term storage, low access. Make sure to properly archive Backup Tier Low-access remote object backup. Make sure to properly archive For pricing information, see the ITS Data Rates . All prices are charged monthly for storage used at that time. To request a share, press the \u201cRequest this Service\u201d button in the right sidebar on the Storage@Yale website . If you would like to request a share that is mounted on the clusters, specify in your request that the share be mounted from the HPC clusters . If you elect to use archive tier storage, be cognizant of its performance characteristics . Cluster I/O Performance Since cluster-mounted S@Y shares do not provide sufficient performance for use in jobs, they are not mounted on our compute or login nodes. To access S@Y on the clusters, connect to one of the transfer nodes to stage the data to Project or Scratch60 before running jobs.","title":"Storage @ Yale"},{"location":"data/#microsoft-teamssharepoint","text":"Capacity: 25 TB, 250 GB per file. Cost: Free You can request a Team and 25TiB of underlying SharePoint storage space from ITS Email And Collaboration Services . For more information on The relationship between Teams, SharePoint, and OneDrive, see the official Microsoft post on the subject .","title":"Microsoft Teams/SharePoint"},{"location":"data/#dropbox-at-yale","text":"ITS offers departmental subscriptions to DropBox for a low cost (currently $23.66/user/year). Unlimited storage (take this with a grain of salt) Low risk data only For more information about DropBox at Yale, see the ITS website.","title":"Dropbox at Yale"},{"location":"data/#box-at-yale","text":"Capacity: 50GiB per user. Cost: Free. 15 GiB max file size. Sensitive data (e.g. ePHI, HIPAA) only in Secure Box Can be mounted on your local machine and transferred with rclone All members of the Yale community have access to a share at Box at Yale. Box is another cloud-based file sharing and storage service. You can upload and access your data using the web portal and sync data with your local machines via Box Sync. To access, navigate to yale.box.com and login with your yale.edu account. For sync with your local machine, install Box Sync and authenticate with your yale.edu account. For more information about Box at Yale, see the ITS website. To learn more about these options, see the Yale Collaboration Counts page available through Yale ITS for details.","title":"Box at Yale"},{"location":"data/archive/","text":"Archive Your Data Clean Out Unnecessary Files Not every file created during a project needs to be archived. If you proactively reduce the number of extraneous files in your archive, you will both reduce storage costs and increase the usefulness of that data upon retrieval. Common files that can be deleted when archiving data include: Compiled codes, such as .o or .pyc files. These files will likely not even work on the next system you may restore these data to and they can contribute significantly to your file count limit. Just keep the source code and clean installation instructions. Some log files. Many log created by the system are not necessary to store indefinitely. Any Slurm logs from failed runs (prior to a successful run) or outputs from Matlab (e.g. hs_error_pid*.log , java.log.* ) can often safely be ignored. Crash files such are core dumps (e.g. core.* , matlab_crash_dump. ). Compress Your Data Most archive locations (S@Y Archive Tier, Google Drive) perform much better with a smaller number of larger files. In fact, Google Shared Drives have a file count limit of 400,000 files. Therefore, it is highly recommended that your compress, using zip or tar , portions of your data for ease of storage and retrieval. For example, to create a compressed archive of a directory you can do the following: tar -cvzf archive-2021-04-26.tar.gz ./data_for_archival This will create a new file ( archive-2021-04-26.tar.gz ) which contains all the data from within data_for_archival and is compressed to minimize storage requirements. This file can then be transferred to any off-site backup or archive location. List and Extract Data From Existing Archive You can list the contents of an archive file like this: tar -ztvf archive-2021-04-26.tar.gz which will print the full list of every file within the archive. The clusters also have the lz tool installed that provides a shorter way to list the contents: lz archive-2021-04-26.tar.gz You can then extract a single file from a large tar-file without decompressing the full thing: tar -zxvf archive-2021-04-26.tar.gz path/to/file.txt There is an alternative syntax that is more legible: tar --extract --file = archive-2021-04-26.tar.gz file.txt Either should work fine on the clusters. Tips for S@Y Archive Tier The archive tier of Storage@Yale is a cloud-based system. It provides an archive location for long-term data, featuring professional systems management, security, and protection from data loss via redundant, enterprise-grade hardware. Data is dual-written to two locations. The cost per TB is subtantially lower than for the active-access S@Y tier. For current pricing, see ITS Data Rates . To use S@Y (Archive) effectively, you need to be aware of how it works and follow some best practices. Note Just as for the S@Y Active Tier , direct access from the cluster should be specified when requesting the share. Direct access from the cluster is only authorized for Low and Moderate risk data. When you write to the archive, you are actually copying to a large hard disk-based cache, so writes are normally fast. Your copy will appear to complete as soon as the file is in the disk cache. It is NOT yet in the cloud. In the background, the system will flush files to the cloud and delete them from the cache. If you read a file very soon after you write it, it is probably still in the cache, and your read will be quick. However, once some time has elapsed and the file has been moved to the cloud, read speed will be somewhat slower. Note S@Y Archive has a single-filesize limit of 5 TB, so plan your data compressions accordingly. Some key takeaways: Operations that only read the metadata of files will be fast (ls, find, etc) even if the file is in the cloud, since metadata is kept in the disk cache. Operations that actually read the file (cp, wc -l, tar, etc) will require recovering the entire file to disk cache first, and can take several minutes or longer depending on how busy the system is. If many files will need to be recovered together, it is much better to store them as a single file first with tar or zip, then write that file to the archive. Please do NOT write huge numbers of small files. They will be difficult or impossible to restore in large numbers. Please do NOT do repetitive operations like rsyncs to the archive, since they overload the system. S@Y Backup Tier Yale ITS offers dedicated offsite \"S3\"-style object storage for data backup and archive to the cloud. Clients are responsible for the data transfers and recovery via the S3 protocol, such as by using RClone . The Backup Tier is authorized for Low, Moderate, and High Risk data. As with the Archive Tier, the Backup Tier is low-speed and not meant for daily use. For current pricing, see ITS Data Rates .","title":"Archive Your Data"},{"location":"data/archive/#archive-your-data","text":"","title":"Archive Your Data"},{"location":"data/archive/#clean-out-unnecessary-files","text":"Not every file created during a project needs to be archived. If you proactively reduce the number of extraneous files in your archive, you will both reduce storage costs and increase the usefulness of that data upon retrieval. Common files that can be deleted when archiving data include: Compiled codes, such as .o or .pyc files. These files will likely not even work on the next system you may restore these data to and they can contribute significantly to your file count limit. Just keep the source code and clean installation instructions. Some log files. Many log created by the system are not necessary to store indefinitely. Any Slurm logs from failed runs (prior to a successful run) or outputs from Matlab (e.g. hs_error_pid*.log , java.log.* ) can often safely be ignored. Crash files such are core dumps (e.g. core.* , matlab_crash_dump. ).","title":"Clean Out Unnecessary Files"},{"location":"data/archive/#compress-your-data","text":"Most archive locations (S@Y Archive Tier, Google Drive) perform much better with a smaller number of larger files. In fact, Google Shared Drives have a file count limit of 400,000 files. Therefore, it is highly recommended that your compress, using zip or tar , portions of your data for ease of storage and retrieval. For example, to create a compressed archive of a directory you can do the following: tar -cvzf archive-2021-04-26.tar.gz ./data_for_archival This will create a new file ( archive-2021-04-26.tar.gz ) which contains all the data from within data_for_archival and is compressed to minimize storage requirements. This file can then be transferred to any off-site backup or archive location.","title":"Compress Your Data"},{"location":"data/archive/#list-and-extract-data-from-existing-archive","text":"You can list the contents of an archive file like this: tar -ztvf archive-2021-04-26.tar.gz which will print the full list of every file within the archive. The clusters also have the lz tool installed that provides a shorter way to list the contents: lz archive-2021-04-26.tar.gz You can then extract a single file from a large tar-file without decompressing the full thing: tar -zxvf archive-2021-04-26.tar.gz path/to/file.txt There is an alternative syntax that is more legible: tar --extract --file = archive-2021-04-26.tar.gz file.txt Either should work fine on the clusters.","title":"List and Extract Data From Existing Archive"},{"location":"data/archive/#tips-for-sy-archive-tier","text":"The archive tier of Storage@Yale is a cloud-based system. It provides an archive location for long-term data, featuring professional systems management, security, and protection from data loss via redundant, enterprise-grade hardware. Data is dual-written to two locations. The cost per TB is subtantially lower than for the active-access S@Y tier. For current pricing, see ITS Data Rates . To use S@Y (Archive) effectively, you need to be aware of how it works and follow some best practices. Note Just as for the S@Y Active Tier , direct access from the cluster should be specified when requesting the share. Direct access from the cluster is only authorized for Low and Moderate risk data. When you write to the archive, you are actually copying to a large hard disk-based cache, so writes are normally fast. Your copy will appear to complete as soon as the file is in the disk cache. It is NOT yet in the cloud. In the background, the system will flush files to the cloud and delete them from the cache. If you read a file very soon after you write it, it is probably still in the cache, and your read will be quick. However, once some time has elapsed and the file has been moved to the cloud, read speed will be somewhat slower. Note S@Y Archive has a single-filesize limit of 5 TB, so plan your data compressions accordingly. Some key takeaways: Operations that only read the metadata of files will be fast (ls, find, etc) even if the file is in the cloud, since metadata is kept in the disk cache. Operations that actually read the file (cp, wc -l, tar, etc) will require recovering the entire file to disk cache first, and can take several minutes or longer depending on how busy the system is. If many files will need to be recovered together, it is much better to store them as a single file first with tar or zip, then write that file to the archive. Please do NOT write huge numbers of small files. They will be difficult or impossible to restore in large numbers. Please do NOT do repetitive operations like rsyncs to the archive, since they overload the system.","title":"Tips for S@Y Archive Tier"},{"location":"data/archive/#sy-backup-tier","text":"Yale ITS offers dedicated offsite \"S3\"-style object storage for data backup and archive to the cloud. Clients are responsible for the data transfers and recovery via the S3 protocol, such as by using RClone . The Backup Tier is authorized for Low, Moderate, and High Risk data. As with the Archive Tier, the Backup Tier is low-speed and not meant for daily use. For current pricing, see ITS Data Rates .","title":"S@Y Backup Tier"},{"location":"data/archived-sequencing/","text":"YCGA Sequence Data Archive Retrieve Data from the Archive In the sequencing archive on McCleary , a directory exists for each run, holding one or more tar files. There is a main tar file, plus a tar file for each project directory. Most users only need the project tar file corresponding to their data. Although the archive actually exists on tape or in cloud storage, you can treat it as a regular directory tree. Many operations such as ls , cd , etc. are very fast, since directory structures and file metadata are on a disk cache. However, when you actually read the contents of files the file is retrieved and read into a disk cache. This can take some time. Archived runs are stored in the following locations. Original location Archive location /panfs/sequencers /SAY/archive/YCGA-729009-YCGA-A2/archive/panfs/sequencers /ycga-ba/ba_sequencers /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-ba/ba_sequencers /gpfs/ycga/sequencers/illumina/sequencers /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencers You can directly copy or untar the project tarfile into a scratch directory. Info Very large tar files over 500GB, sometimes fail to download. If you run into problems, contact us at hpc@yale.edu and we can manually download it. cd ~/scratch60/somedir tar \u2013xvf /SAY/archive/YCGA-729009-YCGA-A2/archive/path/to/file.tar Inside the project tar files are the fastq files, which have been compressed using quip . If your pipeline cannot read quip files directly, you will need to uncompress them before using them. module load Quip quip \u2013d M20_ACAGTG_L008_R1_009.fastq.qp For your convenience, we have a tool, restore , that will download a tar file, untar it, and uncompress all quip files. module load ycga-public restore \u2013t /SAY/archive/YCGA-729009-YCGA/archive/path/to/file.tar If you have trouble locating your files, you can use the utility locateRun , using any substring of the original run name. locateRun is in the same module as restore. locateRun C9374AN Restore spends most of the time running quip. You can parallelize and thereby speed up that process using the -n flag. restore \u2013n 20 ... Tip When retrieving data, run untar/unquip as a job on a compute node, not a login node and make sure to allocate sufficient resources to your job, e.g. \u2013c 20 --mem=100G . Tip The ycgaFastq tool can also be used to recover archived data. See here . Example: Imagine that user rdb9 wants to restore data from run BHJWZZBCX3 step 1 Initialize compute node with 20 cores salloc -c 20 module load ycga-public step 2 Find the run location $ locateRun BHJWZZBCX3 /ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3.deleted /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3 Note that the original run location has been deleted, but the archive location is listed. step 3 List the contents of the archived run, and locate the desired project tarball: $ ls -1 /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3 210305_D00306_1337_BHJWZZBCX3_0.tar 210305_D00306_1337_BHJWZZBCX3_0_Unaligned_Project_Jdm222.tar 210305_D00306_1337_BHJWZZBCX3_1_Unaligned-1_Project_Rdb9.tar 210305_D00306_1337_BHJWZZBCX3_2021_05_09_04:00:36_archive.log We want 210305_D00306_1337_BHJWZZBCX3_1_Unaligned-1_Project_Rdb9.tar, matching our netid. step 4 Use the restore utility to copy and uncompress the fastq files from the tar file. By default, restore will start 20 threads, which matches our srun above. The restore will likely take several minutes. To see progress, you can use the -v flag. restore -v -t /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3/210305_D00306_1337_BHJWKHBCX3_1_Unaligned-1_Project_Rdb9.tar The restored fastq files will written to a directory like this: 210305_D00306_1337_BHJWZZBCX3/Data/Intensities/BaseCalls/Unaligned*/Project_*","title":"YCGA Sequence Data Archive"},{"location":"data/archived-sequencing/#ycga-sequence-data-archive","text":"","title":"YCGA Sequence Data Archive"},{"location":"data/archived-sequencing/#retrieve-data-from-the-archive","text":"In the sequencing archive on McCleary , a directory exists for each run, holding one or more tar files. There is a main tar file, plus a tar file for each project directory. Most users only need the project tar file corresponding to their data. Although the archive actually exists on tape or in cloud storage, you can treat it as a regular directory tree. Many operations such as ls , cd , etc. are very fast, since directory structures and file metadata are on a disk cache. However, when you actually read the contents of files the file is retrieved and read into a disk cache. This can take some time. Archived runs are stored in the following locations. Original location Archive location /panfs/sequencers /SAY/archive/YCGA-729009-YCGA-A2/archive/panfs/sequencers /ycga-ba/ba_sequencers /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-ba/ba_sequencers /gpfs/ycga/sequencers/illumina/sequencers /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencers You can directly copy or untar the project tarfile into a scratch directory. Info Very large tar files over 500GB, sometimes fail to download. If you run into problems, contact us at hpc@yale.edu and we can manually download it. cd ~/scratch60/somedir tar \u2013xvf /SAY/archive/YCGA-729009-YCGA-A2/archive/path/to/file.tar Inside the project tar files are the fastq files, which have been compressed using quip . If your pipeline cannot read quip files directly, you will need to uncompress them before using them. module load Quip quip \u2013d M20_ACAGTG_L008_R1_009.fastq.qp For your convenience, we have a tool, restore , that will download a tar file, untar it, and uncompress all quip files. module load ycga-public restore \u2013t /SAY/archive/YCGA-729009-YCGA/archive/path/to/file.tar If you have trouble locating your files, you can use the utility locateRun , using any substring of the original run name. locateRun is in the same module as restore. locateRun C9374AN Restore spends most of the time running quip. You can parallelize and thereby speed up that process using the -n flag. restore \u2013n 20 ... Tip When retrieving data, run untar/unquip as a job on a compute node, not a login node and make sure to allocate sufficient resources to your job, e.g. \u2013c 20 --mem=100G . Tip The ycgaFastq tool can also be used to recover archived data. See here .","title":"Retrieve Data from the Archive"},{"location":"data/archived-sequencing/#example","text":"Imagine that user rdb9 wants to restore data from run BHJWZZBCX3","title":"Example:"},{"location":"data/archived-sequencing/#step-1","text":"Initialize compute node with 20 cores salloc -c 20 module load ycga-public","title":"step 1"},{"location":"data/archived-sequencing/#step-2","text":"Find the run location $ locateRun BHJWZZBCX3 /ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3.deleted /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3 Note that the original run location has been deleted, but the archive location is listed.","title":"step 2"},{"location":"data/archived-sequencing/#step-3","text":"List the contents of the archived run, and locate the desired project tarball: $ ls -1 /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3 210305_D00306_1337_BHJWZZBCX3_0.tar 210305_D00306_1337_BHJWZZBCX3_0_Unaligned_Project_Jdm222.tar 210305_D00306_1337_BHJWZZBCX3_1_Unaligned-1_Project_Rdb9.tar 210305_D00306_1337_BHJWZZBCX3_2021_05_09_04:00:36_archive.log We want 210305_D00306_1337_BHJWZZBCX3_1_Unaligned-1_Project_Rdb9.tar, matching our netid.","title":"step 3"},{"location":"data/archived-sequencing/#step-4","text":"Use the restore utility to copy and uncompress the fastq files from the tar file. By default, restore will start 20 threads, which matches our srun above. The restore will likely take several minutes. To see progress, you can use the -v flag. restore -v -t /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3/210305_D00306_1337_BHJWKHBCX3_1_Unaligned-1_Project_Rdb9.tar The restored fastq files will written to a directory like this: 210305_D00306_1337_BHJWZZBCX3/Data/Intensities/BaseCalls/Unaligned*/Project_*","title":"step 4"},{"location":"data/backups/","text":"Backups and Snapshots The only storage backed up on every cluster is Home. We do provide local snapshots, covering at least the last 2 days, on Home and Project directories (see below for details). See the individual cluster documentation for more details about which storage is backed up or has snapshots. Please see our HPC Policies page for additional information about backups. Retrieve Data from Home Backups Contact us with your netid and the list of files/directories you would like restored. For any data deleted in the last couple days, first try the self-service snapshots described below. Retrieve Data from Snapshots Our clusters create snapshots nightly on portions of the filesystem so that you can retrieve mistakenly modified or deleted files for yourself. We do not currently provide snapshots of scratch storage. As long as your files existed in the form you want them in before the most recent midnight and the deletion was in the last few days, they can probably be recovered. Snapshot directory structure mirrors the files that are being tracked with a prefix, listed in the table below. Contact us if you need assistance finding the appropriate snapshot location for your files. File set Snapshot Prefix /gpfs/gibbs/project /gpfs/gibbs/project/.snapshots /gpfs/gibbs/pi/group /gpfs/gibbs/pi/group/.snapshots /vast/palmer/home.grace /vast/palmer/home.grace/.snapshot /vast/palmer/home.mccleary /vast/palmer/home.mccleary/.snapshot /gpfs/ycga /gpfs/ycga/.snapshots /gpfs/milgram/home /gpfs/milgram/home/.snapshots /gpfs/milgram/project /gpfs/milgram/project/.snapshots /gpfs/milgram/pi/groupname /gpfs/milgram/pi/groupname/.snapshots /gpfs/slayman/pi/gerstein /gpfs/slayman/pi/gerstein/.snapshots Within the snapshot directory, you will find multiple directories with names that indicate specific dates. For example, if you wanted to recover the file /gpfs/gibbs/project/bjornson/rdb9/doit.sh (a file in the bjornson group's project directory owned by rdb9) it would be found at /gpfs/gibbs/.snapshots/date/project/bjornson/rdb9/doit.sh . Snapshot Sizes Because of the way snapshots are stored, sizes will not be correctly reported until you copy your files/directories back out of the .snapshots directory.","title":"Backups and Snapshots"},{"location":"data/backups/#backups-and-snapshots","text":"The only storage backed up on every cluster is Home. We do provide local snapshots, covering at least the last 2 days, on Home and Project directories (see below for details). See the individual cluster documentation for more details about which storage is backed up or has snapshots. Please see our HPC Policies page for additional information about backups.","title":"Backups and Snapshots"},{"location":"data/backups/#retrieve-data-from-home-backups","text":"Contact us with your netid and the list of files/directories you would like restored. For any data deleted in the last couple days, first try the self-service snapshots described below.","title":"Retrieve Data from Home Backups"},{"location":"data/backups/#retrieve-data-from-snapshots","text":"Our clusters create snapshots nightly on portions of the filesystem so that you can retrieve mistakenly modified or deleted files for yourself. We do not currently provide snapshots of scratch storage. As long as your files existed in the form you want them in before the most recent midnight and the deletion was in the last few days, they can probably be recovered. Snapshot directory structure mirrors the files that are being tracked with a prefix, listed in the table below. Contact us if you need assistance finding the appropriate snapshot location for your files. File set Snapshot Prefix /gpfs/gibbs/project /gpfs/gibbs/project/.snapshots /gpfs/gibbs/pi/group /gpfs/gibbs/pi/group/.snapshots /vast/palmer/home.grace /vast/palmer/home.grace/.snapshot /vast/palmer/home.mccleary /vast/palmer/home.mccleary/.snapshot /gpfs/ycga /gpfs/ycga/.snapshots /gpfs/milgram/home /gpfs/milgram/home/.snapshots /gpfs/milgram/project /gpfs/milgram/project/.snapshots /gpfs/milgram/pi/groupname /gpfs/milgram/pi/groupname/.snapshots /gpfs/slayman/pi/gerstein /gpfs/slayman/pi/gerstein/.snapshots Within the snapshot directory, you will find multiple directories with names that indicate specific dates. For example, if you wanted to recover the file /gpfs/gibbs/project/bjornson/rdb9/doit.sh (a file in the bjornson group's project directory owned by rdb9) it would be found at /gpfs/gibbs/.snapshots/date/project/bjornson/rdb9/doit.sh . Snapshot Sizes Because of the way snapshots are stored, sizes will not be correctly reported until you copy your files/directories back out of the .snapshots directory.","title":"Retrieve Data from Snapshots"},{"location":"data/external/","text":"Share Data Outside Yale Share data using Microsoft OneDrive Yale ITS's recommended way to send other people large files is by using Microsoft OneDrive. See details . Public Website Researchers frequently ask how they can set up a public website to share data or provide a web-based application. The easiest way to do this is by using Yale ITS's spinup service. First get an account on Spinup . Info When getting your account on Spinup, you will need to provide a charging account (aka COA). Static website You can use a static website with a public address to serve data publicly to collaborators or services that need to see the data via http. A common example of this is hosting tracks for the UCSC Genome Browser. Note that this only serves static files. If you wish to host a dynamic web application, see below. ITS's spinup service makes creating a static website easy and inexpensive. Follow their instructions on creating a static website , giving it an appropriate website name. Make sure to save the access key and secret key, since you'll need them to connect to the website. The static website will incur a small charge per month of a few cents per GB stored or downloaded. Then use an S3 transfer tool like Cyberduck, AWS CLI, or CrossFTP to connect to the website and transfer your files. The spinup page for your static website provides a link to a Cyberduck config file. That is the probably the easiest way to connect. UCSC Hub To set up the UCSC Hub, follow their directions to set up the appropriate file heirarchy on your static website, using the transfer tool. Web-based application If your web application goes beyond simply serving static data, the best solution is to create a spinup virtual machine (VM), set up your web application on the VM, then follow the spinup instructions on requesting public access to a web server Info Running a VM 24x7 can incur significant costs on spinup, depending on the size of the VM. Private Share Using Globus Globus can be used to shared data hosts on one of the clusters privately with a specific person or group of people. From the file manager interface enter the name of the endpoint you would like to share from in the collection field (e.g. yale#grace) Click the Share button on the right Click on \"Add a Shared Endpoint\" Next to Path, click \"Browse\" to find and select the directory you want to share Add other details as desired and click on \"Create Share\" Click on \"Add Permissions -- Share With\" Under \"Username or Email\" enter the e-mail address of the person that you want to share the data with, then click on \"Save\", then click on \"Add Permission\" Do not select \"write\" unless you want the person you are sharing the data with to be able to write to your storage on the cluster. For more information, please see the official Globus Documentation .","title":"Share Data Outside Yale"},{"location":"data/external/#share-data-outside-yale","text":"","title":"Share Data Outside Yale"},{"location":"data/external/#share-data-using-microsoft-onedrive","text":"Yale ITS's recommended way to send other people large files is by using Microsoft OneDrive. See details .","title":"Share data using Microsoft OneDrive"},{"location":"data/external/#public-website","text":"Researchers frequently ask how they can set up a public website to share data or provide a web-based application. The easiest way to do this is by using Yale ITS's spinup service. First get an account on Spinup . Info When getting your account on Spinup, you will need to provide a charging account (aka COA).","title":"Public Website"},{"location":"data/external/#static-website","text":"You can use a static website with a public address to serve data publicly to collaborators or services that need to see the data via http. A common example of this is hosting tracks for the UCSC Genome Browser. Note that this only serves static files. If you wish to host a dynamic web application, see below. ITS's spinup service makes creating a static website easy and inexpensive. Follow their instructions on creating a static website , giving it an appropriate website name. Make sure to save the access key and secret key, since you'll need them to connect to the website. The static website will incur a small charge per month of a few cents per GB stored or downloaded. Then use an S3 transfer tool like Cyberduck, AWS CLI, or CrossFTP to connect to the website and transfer your files. The spinup page for your static website provides a link to a Cyberduck config file. That is the probably the easiest way to connect.","title":"Static website"},{"location":"data/external/#ucsc-hub","text":"To set up the UCSC Hub, follow their directions to set up the appropriate file heirarchy on your static website, using the transfer tool.","title":"UCSC Hub"},{"location":"data/external/#web-based-application","text":"If your web application goes beyond simply serving static data, the best solution is to create a spinup virtual machine (VM), set up your web application on the VM, then follow the spinup instructions on requesting public access to a web server Info Running a VM 24x7 can incur significant costs on spinup, depending on the size of the VM.","title":"Web-based application"},{"location":"data/external/#private-share-using-globus","text":"Globus can be used to shared data hosts on one of the clusters privately with a specific person or group of people. From the file manager interface enter the name of the endpoint you would like to share from in the collection field (e.g. yale#grace) Click the Share button on the right Click on \"Add a Shared Endpoint\" Next to Path, click \"Browse\" to find and select the directory you want to share Add other details as desired and click on \"Create Share\" Click on \"Add Permissions -- Share With\" Under \"Username or Email\" enter the e-mail address of the person that you want to share the data with, then click on \"Save\", then click on \"Add Permission\" Do not select \"write\" unless you want the person you are sharing the data with to be able to write to your storage on the cluster. For more information, please see the official Globus Documentation .","title":"Private Share Using Globus"},{"location":"data/globus/","text":"Large Transfers with Globus For large data transfers both within Yale and to external collaborators, we recommend using Globus. Globus is a file transfer service that is efficient and easy to use. It has several advantages: Robust and fast transfers of large files and/or large collections of files. Files can be transferred between your computer and the clusters. Files can be transferred between Yale and other sites. A web and command-line interface for starting and monitoring transfers. Access to specific files or directories granted to external collaborators in a secure way. Globus transfers data between computers set up as \"endpoints\". The official YCRC endpoints are listed below. Transfers can be to and from these endpoints or those you have defined for yourself with Globus Connect . Course Accounts Globus does not work for course accounts ( _ ). Please try the other transfer methods listed in our Transfer documentation instead. Cluster Endpoints We currently support endpoints for the following clusters. Cluster Globus Endpoint Grace yale#grace McCleary Yale CRC McCleary Milgram Yale CRC Milgram For Grace and McCleary, these endpoints provide access to all files you normally have access to. For security reasons, Milgram Globus uses a staging area ( /gpfs/milgram/globus/$NETID ). Once uploaded, data should be moved from this staging area to its final location within Milgram. Files in the staging area are purged after 21 days. Get Started with Globus In a browser, go to app.globus.org . Use the pull-down menu to select Yale and click \"Continue\". If you are not already logged into CAS, you will be prompted to log in. [First login only] Do not associate with another account yet unless you are familiar with doing this [First login only] Select \"non-profit research or educational purposes\" [First login only] Click on \"Allow\" for allowing Globus Web App From the file manager interface enter the name of the endpoint you would like to browse in the collection field (e.g. yale#grace) Click on the right-hand side menu option \"Transfer or Sync to...\" Enter the second endpoint name in the right search box (e.g. another cluster or your personal endpoint) Select one or more files you would like to transfer and click the appropriate start button on the bottom. To complete a partial transfer, you can click the \"sync\" checkbox in the Transfer Setting window on the Globus page, and hten Globus should resume the transfer where it left off. Manage Your Endpoints To manage your endpoints, such as delete an endpoint, rename it, or share it with additional people (be aware, they will be able to access your storage), go to Manage Endpoint on the Globus website. Setup an Endpoint on Your Computer You can set up your own endpoint for transferring data to and from your own computer with Globus Connect Personal . To transfer or share data between two personal endpoints, you will need to request access to the YCRC's Globus Plus subscription on this page . Setup a Google Drive Endpoint The Globus connector is configured to only allow data to be uploaded into EliApps (Yale's GSuite for Education) Google Drive accounts. If you don't have an EliApps account, request one as described above. To set up your Globus Google Drive endpoint, click on the following link: Setup Globus Google Drive Endpoint Log into Globus, if needed. The first time you login to the Globus Google Drive endpoint, you will be presented with a permissions approval page. If you are ok with the Connector manipulating your files through Globus (which is required), click the Allow button. You may see your Yale EliApps account expressed in an uncommon format, such as netid@yale.edu@accounts.google.com. This is normal, and expected. After your approvals you will be directed to the Globus File Manager, with the default view of \"/My Drive\". To see \"/Team Drives\" and other Google Drive features use the \"up one folder\" arrow icon in the File Manager. To transfer to or from your Google Drive, search in the Collection field for \"YCRC Globus Google Drive Collection\". Note There are \"rate limits\" to how much data and how many files you can transfer in any 24 hours period. If you have hit your rate limit, Globus should automatically resume the transfer during the next 24 hour period. You see a \"Endpoint Busy\" error during this time. Google has a 400,000 file limit per Shared Drive , so if you are archiving data to Google Drive, it is better to compress folders that contain lots of small files (e.g. using tar ) before transferring. In our testing, we have seen up to 10MB/s upload and 100MB/s download speeds. Setup a S3 Endpoint We support creating Globus S3 endpoints. To request a Globus S3 Endpoint, please contact YCRC . Please include in your request: S3 bucket name The Amazon Region for that bucket An initial list of Yale NetIDs who should be able to access the bucket Warning Please DO NOT send us the Amazon login credentials through an insecure method such as email or our ticketing system. After we have created your Globus S3 endpoint, you will be able to further self-serve you own access controls with the Globus portal.","title":"Large Transfers with Globus"},{"location":"data/globus/#large-transfers-with-globus","text":"For large data transfers both within Yale and to external collaborators, we recommend using Globus. Globus is a file transfer service that is efficient and easy to use. It has several advantages: Robust and fast transfers of large files and/or large collections of files. Files can be transferred between your computer and the clusters. Files can be transferred between Yale and other sites. A web and command-line interface for starting and monitoring transfers. Access to specific files or directories granted to external collaborators in a secure way. Globus transfers data between computers set up as \"endpoints\". The official YCRC endpoints are listed below. Transfers can be to and from these endpoints or those you have defined for yourself with Globus Connect . Course Accounts Globus does not work for course accounts ( _ ). Please try the other transfer methods listed in our Transfer documentation instead.","title":"Large Transfers with Globus"},{"location":"data/globus/#cluster-endpoints","text":"We currently support endpoints for the following clusters. Cluster Globus Endpoint Grace yale#grace McCleary Yale CRC McCleary Milgram Yale CRC Milgram For Grace and McCleary, these endpoints provide access to all files you normally have access to. For security reasons, Milgram Globus uses a staging area ( /gpfs/milgram/globus/$NETID ). Once uploaded, data should be moved from this staging area to its final location within Milgram. Files in the staging area are purged after 21 days.","title":"Cluster Endpoints"},{"location":"data/globus/#get-started-with-globus","text":"In a browser, go to app.globus.org . Use the pull-down menu to select Yale and click \"Continue\". If you are not already logged into CAS, you will be prompted to log in. [First login only] Do not associate with another account yet unless you are familiar with doing this [First login only] Select \"non-profit research or educational purposes\" [First login only] Click on \"Allow\" for allowing Globus Web App From the file manager interface enter the name of the endpoint you would like to browse in the collection field (e.g. yale#grace) Click on the right-hand side menu option \"Transfer or Sync to...\" Enter the second endpoint name in the right search box (e.g. another cluster or your personal endpoint) Select one or more files you would like to transfer and click the appropriate start button on the bottom. To complete a partial transfer, you can click the \"sync\" checkbox in the Transfer Setting window on the Globus page, and hten Globus should resume the transfer where it left off.","title":"Get Started with Globus"},{"location":"data/globus/#manage-your-endpoints","text":"To manage your endpoints, such as delete an endpoint, rename it, or share it with additional people (be aware, they will be able to access your storage), go to Manage Endpoint on the Globus website.","title":"Manage Your Endpoints"},{"location":"data/globus/#setup-an-endpoint-on-your-computer","text":"You can set up your own endpoint for transferring data to and from your own computer with Globus Connect Personal . To transfer or share data between two personal endpoints, you will need to request access to the YCRC's Globus Plus subscription on this page .","title":"Setup an Endpoint on Your Computer"},{"location":"data/globus/#setup-a-google-drive-endpoint","text":"The Globus connector is configured to only allow data to be uploaded into EliApps (Yale's GSuite for Education) Google Drive accounts. If you don't have an EliApps account, request one as described above. To set up your Globus Google Drive endpoint, click on the following link: Setup Globus Google Drive Endpoint Log into Globus, if needed. The first time you login to the Globus Google Drive endpoint, you will be presented with a permissions approval page. If you are ok with the Connector manipulating your files through Globus (which is required), click the Allow button. You may see your Yale EliApps account expressed in an uncommon format, such as netid@yale.edu@accounts.google.com. This is normal, and expected. After your approvals you will be directed to the Globus File Manager, with the default view of \"/My Drive\". To see \"/Team Drives\" and other Google Drive features use the \"up one folder\" arrow icon in the File Manager. To transfer to or from your Google Drive, search in the Collection field for \"YCRC Globus Google Drive Collection\". Note There are \"rate limits\" to how much data and how many files you can transfer in any 24 hours period. If you have hit your rate limit, Globus should automatically resume the transfer during the next 24 hour period. You see a \"Endpoint Busy\" error during this time. Google has a 400,000 file limit per Shared Drive , so if you are archiving data to Google Drive, it is better to compress folders that contain lots of small files (e.g. using tar ) before transferring. In our testing, we have seen up to 10MB/s upload and 100MB/s download speeds.","title":"Setup a Google Drive Endpoint"},{"location":"data/globus/#setup-a-s3-endpoint","text":"We support creating Globus S3 endpoints. To request a Globus S3 Endpoint, please contact YCRC . Please include in your request: S3 bucket name The Amazon Region for that bucket An initial list of Yale NetIDs who should be able to access the bucket Warning Please DO NOT send us the Amazon login credentials through an insecure method such as email or our ticketing system. After we have created your Globus S3 endpoint, you will be able to further self-serve you own access controls with the Globus portal.","title":"Setup a S3 Endpoint"},{"location":"data/glossary/","text":"Glossary To help clarify the way we refer to certain terms in our user documentation, here is a brief list of some of the words that regularly come up in our documents. Please reach out to us at hpc@yale.edu if there are any other words that need to be added. Account - used to authenticate and grant permission to access resources Account (Slurm) - an accounting mechanism to keep track of a group's computing usage Activate - making something operational Array - a data structure across a series of memory locations consisting of elements organized in an index Array (job) - a series of jobs that all request the same resources and run the same batch script Array Task ID - a unique sequential number with an appended number that refers to an individual task within the set of submitted jobs Channel - Community-led collections of packages created by a group or lab installed with conda to allow for a homogenous environment across systems CLI - Command Line Interface processes commands to a computer program in the form of lines of text in a window Cluster - a set of computers, called nodes) networked together so nodes can perform the tasks facilitated by a scheduling software Command - a specific order from a computer to execute a service with either an application or the operating system Compute Node - the nodes that work runs on to perform computational work Container - A stack of software, libraries and operating system that is independent of the host computer and can be accessed on other computers Container Image - Self-contained read-only files used to run applications CPU - Central Processing Units are the components of a system that perform basic operations and exchange data with the system\u2019s memory (also known as a processor) Data - items of information collected together for reference or analysis Database - a collection of structured data held within a computer Deactivate - making something de-operational Environment - a collection of hardware, software, data storage and networks that work together in facilitating the processing and exchange of information Extension - Suffix at the end of a filename to indicate the file type Fileset - a section of a storage device that is given a designated purpose Filesystem - a process that manages how and where data is stored Flag - (see Options) GPU - Graphics Processing Units are specialized circuits designed to rapidly manipulate memory and create images in a frame buffer for a displayed output GridFTP - an extension of the Fire Transfer Protocol for grid computing that allows users to transfer and save data on a different account such as Google Drive or other off network memory Group - a collection of users who can all be given the same permissions on a system GUI - Graphical User Interface allows users to interact with devices and applications through a visual window that can commonly display icons and predetermined fields Hardware - the physical parts of a computer Host - (ie. Host Computer) A device connected to a computer network that offers resources, services and applications to users on the network Image - (See Container Image) Index - a method of sorting data by creating keywords or a listing of data Interface - a boundary across which two or more computer system components can exchange information Job - a unit of work given to an operating system by a scheduler Job Array - a way to submit multiple similar jobs by associating each subjob with an index value based on an array task id Key - a variable value applied using an algorithm to a block of unencrypted text to produce encrypted text Load - transfer a program or data into memory or into the CPU Login Node - a node that users log in on to access the cluster Memory - (see RAM) Metadata - A set of data that describes and gives basic information about other data Module - a number of distinct but interrelated units that build up or into a program MPI - Message Passing Interface is a standardized and portable message-passing standard designed to function on parallel computing architectures Multiprocessing - the ability to operate more than one task simultaneously on the same program across two or more processors in a computer Node - a server in the cluster Option - a single letter or full word that modifies the behavior of a command in a predetermined way (also known as a flag or switch) Package - a collection of hardware and software needed to create a working system Parallel - (ex. Computing/Programming) Architecture in which several processes are carried out simultaneously across smaller, independent parts Partition - a section of a storage device that is given a designated purpose Partition (Slurm) - a collection of compute nodes available via the scheduler Path - A string of characters used to identify locations throughout a directory structure Pane - (Associate with window) A subdivision within a window where an independent terminal can run simultaneously alongside another terminal Processor - (see CPU) Queue - a sequence of objects arranged according to priority waiting to be processed RAM - Random Access Memory, also known as \"Memory\" can be read and changed in any order and is typically used to to store working data Reproducibility - the ability to execute the same results across multiple systems by different individuals using the same data Scheduler - the software used to assign resources to a job for tasks Scheduling - the act of assigning resources to a task through a software product Session - a temporary information exchange between two or more devices SSH - secure shell is a cryptographic network protocol for operating network services securely over an unsecured network Software - a collection of data and instructions that tell a computer how to operate Switch - (see Options) System - a set of integrated hardware and software that input, output, process, and store data and information Task ID - a unique sequential number used to refer to a task Terminal - Referring to a terminal program, a text-based interface for typing commands Toolchain - a set of tools performing individual actions used in delivering an operation Unload - remove a program or data from memory or out of the CPU User - a person interacting and utilizing a computing service Variable - assigned and referenced data values that can be called within a program and changed depending on how the program runs Window - (Associate with pane) the whole screen being displayed, containing subdivisions, or panes, that can run independent terminals alongside each other","title":"Glossary"},{"location":"data/glossary/#glossary","text":"To help clarify the way we refer to certain terms in our user documentation, here is a brief list of some of the words that regularly come up in our documents. Please reach out to us at hpc@yale.edu if there are any other words that need to be added. Account - used to authenticate and grant permission to access resources Account (Slurm) - an accounting mechanism to keep track of a group's computing usage Activate - making something operational Array - a data structure across a series of memory locations consisting of elements organized in an index Array (job) - a series of jobs that all request the same resources and run the same batch script Array Task ID - a unique sequential number with an appended number that refers to an individual task within the set of submitted jobs Channel - Community-led collections of packages created by a group or lab installed with conda to allow for a homogenous environment across systems CLI - Command Line Interface processes commands to a computer program in the form of lines of text in a window Cluster - a set of computers, called nodes) networked together so nodes can perform the tasks facilitated by a scheduling software Command - a specific order from a computer to execute a service with either an application or the operating system Compute Node - the nodes that work runs on to perform computational work Container - A stack of software, libraries and operating system that is independent of the host computer and can be accessed on other computers Container Image - Self-contained read-only files used to run applications CPU - Central Processing Units are the components of a system that perform basic operations and exchange data with the system\u2019s memory (also known as a processor) Data - items of information collected together for reference or analysis Database - a collection of structured data held within a computer Deactivate - making something de-operational Environment - a collection of hardware, software, data storage and networks that work together in facilitating the processing and exchange of information Extension - Suffix at the end of a filename to indicate the file type Fileset - a section of a storage device that is given a designated purpose Filesystem - a process that manages how and where data is stored Flag - (see Options) GPU - Graphics Processing Units are specialized circuits designed to rapidly manipulate memory and create images in a frame buffer for a displayed output GridFTP - an extension of the Fire Transfer Protocol for grid computing that allows users to transfer and save data on a different account such as Google Drive or other off network memory Group - a collection of users who can all be given the same permissions on a system GUI - Graphical User Interface allows users to interact with devices and applications through a visual window that can commonly display icons and predetermined fields Hardware - the physical parts of a computer Host - (ie. Host Computer) A device connected to a computer network that offers resources, services and applications to users on the network Image - (See Container Image) Index - a method of sorting data by creating keywords or a listing of data Interface - a boundary across which two or more computer system components can exchange information Job - a unit of work given to an operating system by a scheduler Job Array - a way to submit multiple similar jobs by associating each subjob with an index value based on an array task id Key - a variable value applied using an algorithm to a block of unencrypted text to produce encrypted text Load - transfer a program or data into memory or into the CPU Login Node - a node that users log in on to access the cluster Memory - (see RAM) Metadata - A set of data that describes and gives basic information about other data Module - a number of distinct but interrelated units that build up or into a program MPI - Message Passing Interface is a standardized and portable message-passing standard designed to function on parallel computing architectures Multiprocessing - the ability to operate more than one task simultaneously on the same program across two or more processors in a computer Node - a server in the cluster Option - a single letter or full word that modifies the behavior of a command in a predetermined way (also known as a flag or switch) Package - a collection of hardware and software needed to create a working system Parallel - (ex. Computing/Programming) Architecture in which several processes are carried out simultaneously across smaller, independent parts Partition - a section of a storage device that is given a designated purpose Partition (Slurm) - a collection of compute nodes available via the scheduler Path - A string of characters used to identify locations throughout a directory structure Pane - (Associate with window) A subdivision within a window where an independent terminal can run simultaneously alongside another terminal Processor - (see CPU) Queue - a sequence of objects arranged according to priority waiting to be processed RAM - Random Access Memory, also known as \"Memory\" can be read and changed in any order and is typically used to to store working data Reproducibility - the ability to execute the same results across multiple systems by different individuals using the same data Scheduler - the software used to assign resources to a job for tasks Scheduling - the act of assigning resources to a task through a software product Session - a temporary information exchange between two or more devices SSH - secure shell is a cryptographic network protocol for operating network services securely over an unsecured network Software - a collection of data and instructions that tell a computer how to operate Switch - (see Options) System - a set of integrated hardware and software that input, output, process, and store data and information Task ID - a unique sequential number used to refer to a task Terminal - Referring to a terminal program, a text-based interface for typing commands Toolchain - a set of tools performing individual actions used in delivering an operation Unload - remove a program or data from memory or out of the CPU User - a person interacting and utilizing a computing service Variable - assigned and referenced data values that can be called within a program and changed depending on how the program runs Window - (Associate with pane) the whole screen being displayed, containing subdivisions, or panes, that can run independent terminals alongside each other","title":"Glossary"},{"location":"data/google-drive/","text":"Google Drive Through Yale Google Apps for Education (EliApps), researchers have access to 5GB of storage with the option to purchase additional storage as needed. The Globus Google Drive connector allows you to create a Globus endpoint that allows you to use the Globus infrastructure to transfer data into your Google Drive account. As always, no sensitive data (e.g. ePHI, HIPAA) is allowed in Google Drive storage. EliApps If your Yale email account is already an EliApps account (Gmail), then you are all set. If your Yale email is in Microsoft Office365, send an email to the ITS helpdesk requesting a \"no-email EliApps account\". Once it is created you can login to Google Drive using your EliApps account name, which will be of the form netid@yale.edu . The Globus connector is configured to only allow data to be uploaded into EliApps Google Drive accounts. Google Shared Drives (formerly Team Drive) Shared Drives is an additional feature for EliApps that is available by request only (at the moment). A Shared Drive is a Google Drive space that solves a lot of ownership and permissions issues present with traditional shared Google Drive folder. Once you create a Shared Drive, e.g. for a project or research group, any data placed in that Drive are owned by the drive and the permission (which accounts can own or access the data) can be easily managed from the Shared Drive interface by drive owners. With Shared Drive, you can be sure the data will stay with research group as students and postdocs come and go. If your group already uses Google Drive, contact us if you need additional Shared Drives. Although group members are limited to a default of 5GB of EliApps Storage, this can be increased as needed by reaching out through the Yale ITS Google Shared page . Aside from these quota limits, there are also limits for Google Shared Drives put in place by Google directly. Some are listed below. Warning To keep file counts low (and for easier data retrieval) we highly recommended that you archive your data using zip or tar . Limit type Limit Number of files and folders 400,000 Daily upload cap 750 GiB Max individual file size 5 TiB Max number of nested folders 20 Local File Access You can upload and access your data using the web portal and sync data with your local machines via the Google File Stream software. For sync with your local machine, install Drive for desktop . Authenticate with your EliApps account and you will see Google Drive mounted as an additional drive on your machine. Rclone You can also transfer data using the command line utility Rclone . Rclone can be used to transfer data to any Google Drive account. Globus Google Drive Connector You can use Globus to transfer data to/from any EliApps Google Drive as well. See our Globus documentation for more information.","title":"Google Drive"},{"location":"data/google-drive/#google-drive","text":"Through Yale Google Apps for Education (EliApps), researchers have access to 5GB of storage with the option to purchase additional storage as needed. The Globus Google Drive connector allows you to create a Globus endpoint that allows you to use the Globus infrastructure to transfer data into your Google Drive account. As always, no sensitive data (e.g. ePHI, HIPAA) is allowed in Google Drive storage.","title":"Google Drive"},{"location":"data/google-drive/#eliapps","text":"If your Yale email account is already an EliApps account (Gmail), then you are all set. If your Yale email is in Microsoft Office365, send an email to the ITS helpdesk requesting a \"no-email EliApps account\". Once it is created you can login to Google Drive using your EliApps account name, which will be of the form netid@yale.edu . The Globus connector is configured to only allow data to be uploaded into EliApps Google Drive accounts.","title":"EliApps"},{"location":"data/google-drive/#google-shared-drives-formerly-team-drive","text":"Shared Drives is an additional feature for EliApps that is available by request only (at the moment). A Shared Drive is a Google Drive space that solves a lot of ownership and permissions issues present with traditional shared Google Drive folder. Once you create a Shared Drive, e.g. for a project or research group, any data placed in that Drive are owned by the drive and the permission (which accounts can own or access the data) can be easily managed from the Shared Drive interface by drive owners. With Shared Drive, you can be sure the data will stay with research group as students and postdocs come and go. If your group already uses Google Drive, contact us if you need additional Shared Drives. Although group members are limited to a default of 5GB of EliApps Storage, this can be increased as needed by reaching out through the Yale ITS Google Shared page . Aside from these quota limits, there are also limits for Google Shared Drives put in place by Google directly. Some are listed below. Warning To keep file counts low (and for easier data retrieval) we highly recommended that you archive your data using zip or tar . Limit type Limit Number of files and folders 400,000 Daily upload cap 750 GiB Max individual file size 5 TiB Max number of nested folders 20","title":"Google Shared Drives (formerly Team Drive)"},{"location":"data/google-drive/#local-file-access","text":"You can upload and access your data using the web portal and sync data with your local machines via the Google File Stream software. For sync with your local machine, install Drive for desktop . Authenticate with your EliApps account and you will see Google Drive mounted as an additional drive on your machine.","title":"Local File Access"},{"location":"data/google-drive/#rclone","text":"You can also transfer data using the command line utility Rclone . Rclone can be used to transfer data to any Google Drive account.","title":"Rclone"},{"location":"data/google-drive/#globus-google-drive-connector","text":"You can use Globus to transfer data to/from any EliApps Google Drive as well. See our Globus documentation for more information.","title":"Globus Google Drive Connector"},{"location":"data/group-change/","text":"Group Change When your PI is changed, the primary group of your account on the cluster will also be changed. As a result, you will have a new storage space on the cluster which belongs to the new group, including Home, Project, Scratch, etc. We will change the primary group of your cluster account to the new group and will move all the files stored in your old storage space into the new storage space. However, some local installations most likely will not be able to work properly after being moved. In particular, Conda environments and R packages will fail. You need to rebuild them in your new space under the new group. For R packages, you just need to reinstall them with install.packages() . Rebuild a Conda Environment after Group Change We will use an example to illustrate how to rebuild a conda env after group change. Assume the conda env is originally installed in /gpfs/gibbs/project/oldgrp/user123 , and we want to move it to the project directory of the new group. First, find the paths of the conda env stored in your old space that you want to rebuild in the new space. Set two environment variables CONDA_ENVS_PATH and CONDA_PKGS_DIRS to the paths. module load miniconda export CONDA_ENVS_PATH = /gpfs/gibbs/project/oldgrp/user123/conda_envs export CONDA_PKGS_DIRS = /gpfs/gibbs/project/oldgrp/user123/conda_pkgs conda activate myenv conda env export > myenv.yml conda deactivate Now, start a new login session, submit an interactive job, and rebuild the conda env in your new storage space. When a new session starts, CONDA_ENVS_PATH and CONDA_PKGS_DIRS will be set to the right locations by the system, so you don't have to set them explicitly. ssh grace salloc module load miniconda conda env create -f myenv.yml","title":"Group Change"},{"location":"data/group-change/#group-change","text":"When your PI is changed, the primary group of your account on the cluster will also be changed. As a result, you will have a new storage space on the cluster which belongs to the new group, including Home, Project, Scratch, etc. We will change the primary group of your cluster account to the new group and will move all the files stored in your old storage space into the new storage space. However, some local installations most likely will not be able to work properly after being moved. In particular, Conda environments and R packages will fail. You need to rebuild them in your new space under the new group. For R packages, you just need to reinstall them with install.packages() .","title":"Group Change"},{"location":"data/group-change/#rebuild-a-conda-environment-after-group-change","text":"We will use an example to illustrate how to rebuild a conda env after group change. Assume the conda env is originally installed in /gpfs/gibbs/project/oldgrp/user123 , and we want to move it to the project directory of the new group. First, find the paths of the conda env stored in your old space that you want to rebuild in the new space. Set two environment variables CONDA_ENVS_PATH and CONDA_PKGS_DIRS to the paths. module load miniconda export CONDA_ENVS_PATH = /gpfs/gibbs/project/oldgrp/user123/conda_envs export CONDA_PKGS_DIRS = /gpfs/gibbs/project/oldgrp/user123/conda_pkgs conda activate myenv conda env export > myenv.yml conda deactivate Now, start a new login session, submit an interactive job, and rebuild the conda env in your new storage space. When a new session starts, CONDA_ENVS_PATH and CONDA_PKGS_DIRS will be set to the right locations by the system, so you don't have to set them explicitly. ssh grace salloc module load miniconda conda env create -f myenv.yml","title":"Rebuild a Conda Environment after Group Change"},{"location":"data/hpc-storage/","text":"HPC Storage Along with access to the compute clusters we provide each research group with cluster storage space for research data. The storage is separated into three quotas: Home, Project, and 60-day Scratch. Each of these quotas limit both the amount in bytes and number of files you can store. Hitting your quota stops you from being able to write data, and can cause jobs to fail . You can monitor your storage usage by running the getquota command on a cluster. No sensitive data can be stored on any cluster storage, except for Milgram . Backups The only storage backed up on every cluster is Home. We do provide local snapshots, covering at least the last 2 days, on Home and Project directories (see below for details). Please see our HPC Policies page for additional information about backups. Storage Spaces For an overview of which filesystems are mounted on each cluster, see the HPC Resources documentation. Home Quota: 125 GiB and 500,000 files per person Your home directory is where your sessions begin by default. Its intended use is for storing scripts, notes, final products (e.g. figures), etc. Its path is /home/netid (where netid is your Yale netid) on every cluster. Home storage is backed up daily. If you would like to restore files, please contact us with your netid and the list of files/directories you would like restored. Project Quota: 1 TiB and 5,000,000 files per group, expanded to 4 TiB on request Project storage is shared among all members of a specific group. Project storage is not backed up , so we strongly recommend that you have a second copy somewhere off-cluster of any valuable data you have stored in project. You can access this space through a symlink, or shortcut, in your home directory called project . See our Sharing Data documentation for instructions on sharing data in your project space with other users. Project quotas are global to the whole project space, so if the group ownership on a file is your group, it will count towards your quota, regardless of its location within project . This can occasionally create confusion for users who belong to multiple groups and they need to be mindful of which files are owned by which of their group affiliations to ensure proper accounting. Purchased Storage Quota: varies Storage purchased for the dedicated use by a single group or collection of groups provides similar functionality as project storage and is also not backed up. See below for details on purchasing storage. Purchased storage, if applicable, is located on the Gibbs filesystem in a /gpfs/gibbs/pi/ directory under the group's name. Unlike project space described above, all files in your purchased storage count towards your quotas, regardless of file ownership. 60-Day Scratch Quota: 10 TiB and 15,000,000 files per group 60-day scratch is intended to be used for storing temporary data. Any file in this space older than 60 days will automatically be deleted. We send out a weekly warning about files we expect to delete the following week. Like project, scratch quota is shared by your entire research group. If we begin to run low on storage, you may be asked to delete files younger than 60 days old. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage. You can access this space through a symlink, or shortcut, in your home directory called palmer_scratch (or scratch60 on Milgram ). See our Sharing Data documentation for instructions on sharing data in your scratch space with other users. Check Your Usage and Quotas To inspect your current usage, run the command getquota . Here is an example output of the command: This script shows information about your quotas on grace. If you plan to poll this sort of information extensively, please contact us for help at hpc@yale.edu ## Usage Details for support (as of Jan 25 2023 12:00) Fileset User Usage (GiB) File Count ---------------------- ----- ---------- ------------- gibbs:project ahs3 568 121,786 gibbs:project kln26 435 423,219 gibbs:project ms725 233 456,736 gibbs:project pl543 427 1,551,959 gibbs:project rdb9 1952 1,049,346 gibbs:project tl397 605 2,573,824 ---- gibbs:pi_support ahs3 0 1 gibbs:pi_support kln26 5886 14,514,143 gibbs:pi_support ms725 19651 2,692,158 gibbs:pi_support pl543 328 142,936 gibbs:pi_support rdb9 1047 165,553 gibbs:pi_support tl397 175 118,038 ## Quota Summary for support (as of right now [*palmer stats are gathered once a day]) Fileset Type Usage (GiB) Quota (GiB) File Count File Limit Backup Purged ---------------------- ------- ------------ ----------- ------------- ------------- --------- --------- palmer:home.grace USR 63 125 216,046 500,000 Yes No gibbs:project GRP 3832 10240 3,350,198 10,000,000 No No palmer:scratch GRP 0 10240 903 15,000,000 No 60 days gibbs:pi_support FILESET 27240 30720 17,647,694 22,000,000 No No The per-user breakdown is only generated periodically, and the summary at the bottom is close to real-time. Purchased storage allocations will only appear in the getquota output for users who have data in that directory. Purchase Additional Storage For long-term allocations, additional project storage spaces can be purchased on our Gibbs filesystem, which provides similar functionality to the primary project storage. This storage currently costs $200/TiB (minimum of 10 TiB, with exact pricing to be confirmed before a purchase is made). The price covers all costs, including administration, power, cooling, networking, etc. YCRC commits to making the storage available for 5 years from the purchase date, after which the storage allocation will need to be renewed, or the allocation will expire and be removed (see Storage Expiration Policy ). For shorter-term or smaller allocations, we have a monthly billing option. More details on this option can be found here (CAS login required). Please note that, as with existing project storage, purchased storage will not be backed up, so you should make arrangements for the safekeeping of critical files off the clusters. Please contact us with your requirements and budget to start the purchasing process. Purchased storage, as with all storage allocations, are subject to corresponding file count limit to preserve the health of the shared storage system. The file count limits for different size allocations are listed above. If you need additional files beyond your limit, contact us to discuss as increases may be granted on a case-by-case basis and at the YCRC's discretion. Allocation Quota File Count Limit < 50 TiB 10 million 50-99 TiB 20 million 100-499 TiB 40 million 500-999 TiB 50 million >= 1 PiB 75 million HPC Storage Best Practices Stage Data Large datasets are often stored off-cluster on departmental servers, Storage@Yale, in cloud storage, etc. If these data are too large to fit in your current quotas and you do not plan on purchasing more storage (see above), you must 'stage' your data. Since the permanent copy of the data remains on off-cluster storage, you can transfer a working copy to palmer_scratch , for example. Both Grace and McCleary have dedicated transfer partitions where you can submit long-running transfer jobs. When your computation finishes you can remove the copy and transmit or copy results to a permanent location. Please see the Staging Data documentation for more details and examples. Prevent Large Numbers of Small Files The parallel filesystems the clusters use perform poorly with very large numbers of small files. This is one reason we enforce file count quotas. If you are running an application that unavoidably make large numbers of files, do what you can to reduce file creation. Additionally you can reduce load on the filesystem by spreading the files across multiple subdirectories. Delete unneeded files between jobs and compress or archive collections of files.","title":"HPC Storage"},{"location":"data/hpc-storage/#hpc-storage","text":"Along with access to the compute clusters we provide each research group with cluster storage space for research data. The storage is separated into three quotas: Home, Project, and 60-day Scratch. Each of these quotas limit both the amount in bytes and number of files you can store. Hitting your quota stops you from being able to write data, and can cause jobs to fail . You can monitor your storage usage by running the getquota command on a cluster. No sensitive data can be stored on any cluster storage, except for Milgram . Backups The only storage backed up on every cluster is Home. We do provide local snapshots, covering at least the last 2 days, on Home and Project directories (see below for details). Please see our HPC Policies page for additional information about backups.","title":"HPC Storage"},{"location":"data/hpc-storage/#storage-spaces","text":"For an overview of which filesystems are mounted on each cluster, see the HPC Resources documentation.","title":"Storage Spaces"},{"location":"data/hpc-storage/#home","text":"Quota: 125 GiB and 500,000 files per person Your home directory is where your sessions begin by default. Its intended use is for storing scripts, notes, final products (e.g. figures), etc. Its path is /home/netid (where netid is your Yale netid) on every cluster. Home storage is backed up daily. If you would like to restore files, please contact us with your netid and the list of files/directories you would like restored.","title":"Home"},{"location":"data/hpc-storage/#project","text":"Quota: 1 TiB and 5,000,000 files per group, expanded to 4 TiB on request Project storage is shared among all members of a specific group. Project storage is not backed up , so we strongly recommend that you have a second copy somewhere off-cluster of any valuable data you have stored in project. You can access this space through a symlink, or shortcut, in your home directory called project . See our Sharing Data documentation for instructions on sharing data in your project space with other users. Project quotas are global to the whole project space, so if the group ownership on a file is your group, it will count towards your quota, regardless of its location within project . This can occasionally create confusion for users who belong to multiple groups and they need to be mindful of which files are owned by which of their group affiliations to ensure proper accounting.","title":"Project"},{"location":"data/hpc-storage/#purchased-storage","text":"Quota: varies Storage purchased for the dedicated use by a single group or collection of groups provides similar functionality as project storage and is also not backed up. See below for details on purchasing storage. Purchased storage, if applicable, is located on the Gibbs filesystem in a /gpfs/gibbs/pi/ directory under the group's name. Unlike project space described above, all files in your purchased storage count towards your quotas, regardless of file ownership.","title":"Purchased Storage"},{"location":"data/hpc-storage/#60-day-scratch","text":"Quota: 10 TiB and 15,000,000 files per group 60-day scratch is intended to be used for storing temporary data. Any file in this space older than 60 days will automatically be deleted. We send out a weekly warning about files we expect to delete the following week. Like project, scratch quota is shared by your entire research group. If we begin to run low on storage, you may be asked to delete files younger than 60 days old. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage. You can access this space through a symlink, or shortcut, in your home directory called palmer_scratch (or scratch60 on Milgram ). See our Sharing Data documentation for instructions on sharing data in your scratch space with other users.","title":"60-Day Scratch"},{"location":"data/hpc-storage/#check-your-usage-and-quotas","text":"To inspect your current usage, run the command getquota . Here is an example output of the command: This script shows information about your quotas on grace. If you plan to poll this sort of information extensively, please contact us for help at hpc@yale.edu ## Usage Details for support (as of Jan 25 2023 12:00) Fileset User Usage (GiB) File Count ---------------------- ----- ---------- ------------- gibbs:project ahs3 568 121,786 gibbs:project kln26 435 423,219 gibbs:project ms725 233 456,736 gibbs:project pl543 427 1,551,959 gibbs:project rdb9 1952 1,049,346 gibbs:project tl397 605 2,573,824 ---- gibbs:pi_support ahs3 0 1 gibbs:pi_support kln26 5886 14,514,143 gibbs:pi_support ms725 19651 2,692,158 gibbs:pi_support pl543 328 142,936 gibbs:pi_support rdb9 1047 165,553 gibbs:pi_support tl397 175 118,038 ## Quota Summary for support (as of right now [*palmer stats are gathered once a day]) Fileset Type Usage (GiB) Quota (GiB) File Count File Limit Backup Purged ---------------------- ------- ------------ ----------- ------------- ------------- --------- --------- palmer:home.grace USR 63 125 216,046 500,000 Yes No gibbs:project GRP 3832 10240 3,350,198 10,000,000 No No palmer:scratch GRP 0 10240 903 15,000,000 No 60 days gibbs:pi_support FILESET 27240 30720 17,647,694 22,000,000 No No The per-user breakdown is only generated periodically, and the summary at the bottom is close to real-time. Purchased storage allocations will only appear in the getquota output for users who have data in that directory.","title":"Check Your Usage and Quotas"},{"location":"data/hpc-storage/#purchase-additional-storage","text":"For long-term allocations, additional project storage spaces can be purchased on our Gibbs filesystem, which provides similar functionality to the primary project storage. This storage currently costs $200/TiB (minimum of 10 TiB, with exact pricing to be confirmed before a purchase is made). The price covers all costs, including administration, power, cooling, networking, etc. YCRC commits to making the storage available for 5 years from the purchase date, after which the storage allocation will need to be renewed, or the allocation will expire and be removed (see Storage Expiration Policy ). For shorter-term or smaller allocations, we have a monthly billing option. More details on this option can be found here (CAS login required). Please note that, as with existing project storage, purchased storage will not be backed up, so you should make arrangements for the safekeeping of critical files off the clusters. Please contact us with your requirements and budget to start the purchasing process. Purchased storage, as with all storage allocations, are subject to corresponding file count limit to preserve the health of the shared storage system. The file count limits for different size allocations are listed above. If you need additional files beyond your limit, contact us to discuss as increases may be granted on a case-by-case basis and at the YCRC's discretion. Allocation Quota File Count Limit < 50 TiB 10 million 50-99 TiB 20 million 100-499 TiB 40 million 500-999 TiB 50 million >= 1 PiB 75 million","title":"Purchase Additional Storage"},{"location":"data/hpc-storage/#hpc-storage-best-practices","text":"","title":"HPC Storage Best Practices"},{"location":"data/hpc-storage/#stage-data","text":"Large datasets are often stored off-cluster on departmental servers, Storage@Yale, in cloud storage, etc. If these data are too large to fit in your current quotas and you do not plan on purchasing more storage (see above), you must 'stage' your data. Since the permanent copy of the data remains on off-cluster storage, you can transfer a working copy to palmer_scratch , for example. Both Grace and McCleary have dedicated transfer partitions where you can submit long-running transfer jobs. When your computation finishes you can remove the copy and transmit or copy results to a permanent location. Please see the Staging Data documentation for more details and examples.","title":"Stage Data"},{"location":"data/hpc-storage/#prevent-large-numbers-of-small-files","text":"The parallel filesystems the clusters use perform poorly with very large numbers of small files. This is one reason we enforce file count quotas. If you are running an application that unavoidably make large numbers of files, do what you can to reduce file creation. Additionally you can reduce load on the filesystem by spreading the files across multiple subdirectories. Delete unneeded files between jobs and compress or archive collections of files.","title":"Prevent Large Numbers of Small Files"},{"location":"data/loomis-decommission/","text":"Loomis Decommission After over eight years in service, the primary storage system on Grace, Loomis (/gpfs/loomis), was retired in December 2022. Since its inception, Loomis doubled in size to host over 2 petabytes of data for more than 600 research groups and almost 4000 individual researchers. The usage and capacity on Loomis has been replaced by two existing YCRC storage systems, Palmer and Gibbs. Unified Storage at the YCRC 2022 saw the introduction of a more unified approach to storage across the YCRC\u2019s clusters. Each group will have one project and one scratch space that is available on all of the HPC clusters (except for Milgram). Project A single project space to host no-cost project-style storage allocations is available on the Gibbs storage system. Purchased allocations are also on Gibbs under the /gpfs/gibbs/pi space of the storage system. Grace users are using this space as of the August 2022 maintenance. Scratch A single scratch space on Palmer, available for Grace users at /vast/palmer/scratch, serves both Grace and McCleary cluster (replacement for Farnam and Ruddle). The Loomis scratch space was decommissioned and purged on October 3, 2022. Software In 2023, a new unified software and module tree was created on Palmer, so the same software will be available for use regardless of which YCRC HPC cluster you are using. We have migrated the software located in /gpfs/loomis/apps/avx to Palmer at /vast/palmer/apps/grace.avx. To continue to support this software without interruption, we are maintaining a symlink at /gpfs/loomis/apps/avx to the new location on Palmer, so software will continue to appear as if it is on Loomis even after the maintenance, despite being hosted on Palmer. In August 2023, Grace was upgraded to Red Hat 8 and this old software tree was deprecated and is no longer supported. What about Existing Data on Loomis? Your Grace home directory was already migrated to Palmer during the January 2022 maintenance. During the Grace Maintenance in August 2022, we migrated all of the Loomis project space ( /gpfs/loomis/project ) to the Gibbs storage system at /gpfs/gibbs/project . You will need to update your scripts and workflows to point to the new location ( /gpfs/gibbs/project// ). The \"project\" symlink in your home directory has been updated to point to your new space (with a few exceptions described below), so scripts using the symlinked path will not need to be updated. If you had a project space that exceeds the no-cost allocation (4TiB), your data was migrated to a new allocation under /gpfs/gibbs/pi . In these instances, your group has been granted a new, empty \"project\" space with the default no-cost quota. Any scripts will need to be updated accordingly. The Loomis scratch space was decommissioned and purged on October 3, 2022. Conda Environments By default, all conda environments are installed into your project directory. However, most conda environments do not survive being moved from one location to another, so you may need to regenerate your conda environment(s). To assist with this, we provide conda-export documentation . R Packages Similarly, in 2022 we started redirecting user R packages to your project space to conserve home directory usage. If your R environment is not working as expected, we recommend deleting the old installation (found in ~/project/R/ ) and rerunning install.packages. Custom Software Installations If you or your group had any self-installed software in the project directory, it is possible that the migration will have broken the software and it will need to be recompiled. Decommission of Old, Deprecated Software Trees As part of the Loomis Decommission, we did not be migrating the old software trees located at /gpfs/loomis/apps/hpc, /gpfs/loomis/apps/hpc.rhel6 and /gpfs/loomis/apps/hpc.rhel7. The deprecated modules can be identified as being prefixed with \"Apps/\", \"GPU/\", \"Libs/\" or \"MPI/\" rather than beginning with the software name. If you are using software modules in one of the old trees, please find an alternative in the current supported tree or reach out to us to install a replacement. Researchers with Purchased Storage on Loomis If you had purchased space that is still active (not expired), we created a new area of the same size for you on Gibbs and transferred your data. If you have purchased storage on /gpfs/loomis that has expired or will be expiring in 2022 and you chose not to renew, any data in that allocation is now retired.","title":"Loomis Decommission"},{"location":"data/loomis-decommission/#loomis-decommission","text":"After over eight years in service, the primary storage system on Grace, Loomis (/gpfs/loomis), was retired in December 2022. Since its inception, Loomis doubled in size to host over 2 petabytes of data for more than 600 research groups and almost 4000 individual researchers. The usage and capacity on Loomis has been replaced by two existing YCRC storage systems, Palmer and Gibbs.","title":"Loomis Decommission"},{"location":"data/loomis-decommission/#unified-storage-at-the-ycrc","text":"2022 saw the introduction of a more unified approach to storage across the YCRC\u2019s clusters. Each group will have one project and one scratch space that is available on all of the HPC clusters (except for Milgram).","title":"Unified Storage at the YCRC"},{"location":"data/loomis-decommission/#project","text":"A single project space to host no-cost project-style storage allocations is available on the Gibbs storage system. Purchased allocations are also on Gibbs under the /gpfs/gibbs/pi space of the storage system. Grace users are using this space as of the August 2022 maintenance.","title":"Project"},{"location":"data/loomis-decommission/#scratch","text":"A single scratch space on Palmer, available for Grace users at /vast/palmer/scratch, serves both Grace and McCleary cluster (replacement for Farnam and Ruddle). The Loomis scratch space was decommissioned and purged on October 3, 2022.","title":"Scratch"},{"location":"data/loomis-decommission/#software","text":"In 2023, a new unified software and module tree was created on Palmer, so the same software will be available for use regardless of which YCRC HPC cluster you are using. We have migrated the software located in /gpfs/loomis/apps/avx to Palmer at /vast/palmer/apps/grace.avx. To continue to support this software without interruption, we are maintaining a symlink at /gpfs/loomis/apps/avx to the new location on Palmer, so software will continue to appear as if it is on Loomis even after the maintenance, despite being hosted on Palmer. In August 2023, Grace was upgraded to Red Hat 8 and this old software tree was deprecated and is no longer supported.","title":"Software"},{"location":"data/loomis-decommission/#what-about-existing-data-on-loomis","text":"Your Grace home directory was already migrated to Palmer during the January 2022 maintenance. During the Grace Maintenance in August 2022, we migrated all of the Loomis project space ( /gpfs/loomis/project ) to the Gibbs storage system at /gpfs/gibbs/project . You will need to update your scripts and workflows to point to the new location ( /gpfs/gibbs/project// ). The \"project\" symlink in your home directory has been updated to point to your new space (with a few exceptions described below), so scripts using the symlinked path will not need to be updated. If you had a project space that exceeds the no-cost allocation (4TiB), your data was migrated to a new allocation under /gpfs/gibbs/pi . In these instances, your group has been granted a new, empty \"project\" space with the default no-cost quota. Any scripts will need to be updated accordingly. The Loomis scratch space was decommissioned and purged on October 3, 2022.","title":"What about Existing Data on Loomis?"},{"location":"data/loomis-decommission/#conda-environments","text":"By default, all conda environments are installed into your project directory. However, most conda environments do not survive being moved from one location to another, so you may need to regenerate your conda environment(s). To assist with this, we provide conda-export documentation .","title":"Conda Environments"},{"location":"data/loomis-decommission/#r-packages","text":"Similarly, in 2022 we started redirecting user R packages to your project space to conserve home directory usage. If your R environment is not working as expected, we recommend deleting the old installation (found in ~/project/R/ ) and rerunning install.packages.","title":"R Packages"},{"location":"data/loomis-decommission/#custom-software-installations","text":"If you or your group had any self-installed software in the project directory, it is possible that the migration will have broken the software and it will need to be recompiled.","title":"Custom Software Installations"},{"location":"data/loomis-decommission/#decommission-of-old-deprecated-software-trees","text":"As part of the Loomis Decommission, we did not be migrating the old software trees located at /gpfs/loomis/apps/hpc, /gpfs/loomis/apps/hpc.rhel6 and /gpfs/loomis/apps/hpc.rhel7. The deprecated modules can be identified as being prefixed with \"Apps/\", \"GPU/\", \"Libs/\" or \"MPI/\" rather than beginning with the software name. If you are using software modules in one of the old trees, please find an alternative in the current supported tree or reach out to us to install a replacement.","title":"Decommission of Old, Deprecated Software Trees"},{"location":"data/loomis-decommission/#researchers-with-purchased-storage-on-loomis","text":"If you had purchased space that is still active (not expired), we created a new area of the same size for you on Gibbs and transferred your data. If you have purchased storage on /gpfs/loomis that has expired or will be expiring in 2022 and you chose not to renew, any data in that allocation is now retired.","title":"Researchers with Purchased Storage on Loomis"},{"location":"data/mccleary-transfer/","text":"Transfer data from Farnam / Ruddle to McCleary In the process of migrating from Farnam/Ruddle to McCleary, we are requesting researchers migrate their own data. Researchers are encouraged to only transfer data which is actively needed and take this opportunity to archive or delete old data. Transfers should be initiated on Ruddle's or McCleary's transfer nodes and sync'd to either Gibbs project directories ( /gpfs/gibbs/project/GROUP/NETID ) or their McCleary home spaces (which are mounted at /vast/palmer/home.mccleary/NETID ). All users are able to log into the transfer nodes via ssh: [ tl397@ruddle1 ~ ] $ ssh transfer [ tl397@transfer-ruddle ~ ] $ Warning Do not attempt to transfer conda environments to McCleary. Environments are not portable and will not work properly if simply copied. Instead, please export and rebuild environments following our guide . The two tools we recommend for this transfer are rsync and Globus . rsync is a command-line utility which copies files, along with their attributes, with protections against file corruption. Globus is a web app where you can schedule large transfers which occur in the background and provide notifications when complete. Since McCleary mounts Farnam and Ruddle's filesystems, these copies are \"local\" copies and should run at high speed. rsync is best suited for smaller data transfers, while Globus is our recommended tool for larger transfers. In this short note we will detail these two approaches. Rsync While rsync is most commonly used for remote transfers between two systems, it is an excellent tool for local work as well. In particular, it's ability to perform tests to make sure that files are transfered properly and to recover from interrupted transfers make it a good option for data migration. There are many configuration possibilities, but we recommend using the following flags: rsync -avP /path/to/existing/data /path/to/new/home/for/data Here the -a will run the transfer in archive mode, which preserves ownership, permissions, and creation/modification times. Additionally, the -v will run in verbose mode where the name of every file is printed out, and -P displays a progress bar. One subtle detail is that rsync changes its behavior based on whether the source path has a trailing / . If one initiates a sync like this: rsync -avP /path/to/existing/data /path/to/new/home/for/data the existing data directory is transferred as a whole entity, including the top-level directory data . However, if the source path includes a trailing / : rsync -avP /path/to/existing/data/ /path/to/new/home/for/data then the contents of data are transferred, omitting the top-level directory. As an example, to transfer a directory (named my_data ) from a YSM project directory on McCleary to your Gibbs project space, you can run: rsync -avP /gpfs/ysm/project/GROUP/NETID/my_data /gpfs/gibbs/project/GROUP/NETID/ Similarly, to transfer a directory ( my_code ) from your YCGA homespace to your new McCleary homespace: rsync -avP /home/NETID/my_code /vast/palmer/home.mccleary/NETID/ where GROUP and NETID are replaced by your specific group/netid. For more detailed information about rsync , please take a look at this nice tutorial ( link ). For rsync transfers that may take a while, it's best to run the transfer inside a tmux virtual login session. This enables you to \"detach\" from the session while the transfer continues in the background. tmux uses special key-strokes to control the session, with the most important being Ctrl-b d (first pressing the control and b keys, releasing, and then pressing d ) which detaches from the current session. To reattach to a detached session, run tmux attach from the same host where tmux was initially started. For more information about tmux , please see their Getting Started Guide . Globus Yale provides dedicated Globus connections for each of the clusters. Transfers can be managed through existing accounts on Ruddle using yale#ruddle , or using McCleary's Globus connection ( Yale CRC McCleary ). For a general getting started with Globus, please check out their website . We have a stand-alone docs page about Globus here , but here we will detail the process to transfer data from YSM (for example) to the Gibbs file system. log in to app.globus.org and use your Yale credentials to authenticate. navigate to the File Manager and access Ruddle or McCleary by searching for the \"collection\" yale#ruddle or Yale CRC McCleary in the left-hand panel. find the files you wish to transfer, using the check-boxes to select any and all files needed. click on the \"Transfer or Sync to\" option and in the right-hand panel also search for the same cluster's collection. navigate through the file-browser to find the desired destination for these data (most likely gibbs_project or a subdirectory). start the transfer, click the \"Start\" button on the left-hand side. This will start a background process to transfer all the selected files and directories to their destination. You will receive an email when the transfer completes detailing the size and average speed of the transferred data. Getting help If you run into any issues or if you would like help in setting up your data migration, please feel free to reach out to hpc@yale.edu to request one-on-one support.","title":"Transfer data from Farnam / Ruddle to McCleary"},{"location":"data/mccleary-transfer/#transfer-data-from-farnam-ruddle-to-mccleary","text":"In the process of migrating from Farnam/Ruddle to McCleary, we are requesting researchers migrate their own data. Researchers are encouraged to only transfer data which is actively needed and take this opportunity to archive or delete old data. Transfers should be initiated on Ruddle's or McCleary's transfer nodes and sync'd to either Gibbs project directories ( /gpfs/gibbs/project/GROUP/NETID ) or their McCleary home spaces (which are mounted at /vast/palmer/home.mccleary/NETID ). All users are able to log into the transfer nodes via ssh: [ tl397@ruddle1 ~ ] $ ssh transfer [ tl397@transfer-ruddle ~ ] $ Warning Do not attempt to transfer conda environments to McCleary. Environments are not portable and will not work properly if simply copied. Instead, please export and rebuild environments following our guide . The two tools we recommend for this transfer are rsync and Globus . rsync is a command-line utility which copies files, along with their attributes, with protections against file corruption. Globus is a web app where you can schedule large transfers which occur in the background and provide notifications when complete. Since McCleary mounts Farnam and Ruddle's filesystems, these copies are \"local\" copies and should run at high speed. rsync is best suited for smaller data transfers, while Globus is our recommended tool for larger transfers. In this short note we will detail these two approaches.","title":"Transfer data from Farnam / Ruddle to McCleary"},{"location":"data/mccleary-transfer/#rsync","text":"While rsync is most commonly used for remote transfers between two systems, it is an excellent tool for local work as well. In particular, it's ability to perform tests to make sure that files are transfered properly and to recover from interrupted transfers make it a good option for data migration. There are many configuration possibilities, but we recommend using the following flags: rsync -avP /path/to/existing/data /path/to/new/home/for/data Here the -a will run the transfer in archive mode, which preserves ownership, permissions, and creation/modification times. Additionally, the -v will run in verbose mode where the name of every file is printed out, and -P displays a progress bar. One subtle detail is that rsync changes its behavior based on whether the source path has a trailing / . If one initiates a sync like this: rsync -avP /path/to/existing/data /path/to/new/home/for/data the existing data directory is transferred as a whole entity, including the top-level directory data . However, if the source path includes a trailing / : rsync -avP /path/to/existing/data/ /path/to/new/home/for/data then the contents of data are transferred, omitting the top-level directory. As an example, to transfer a directory (named my_data ) from a YSM project directory on McCleary to your Gibbs project space, you can run: rsync -avP /gpfs/ysm/project/GROUP/NETID/my_data /gpfs/gibbs/project/GROUP/NETID/ Similarly, to transfer a directory ( my_code ) from your YCGA homespace to your new McCleary homespace: rsync -avP /home/NETID/my_code /vast/palmer/home.mccleary/NETID/ where GROUP and NETID are replaced by your specific group/netid. For more detailed information about rsync , please take a look at this nice tutorial ( link ). For rsync transfers that may take a while, it's best to run the transfer inside a tmux virtual login session. This enables you to \"detach\" from the session while the transfer continues in the background. tmux uses special key-strokes to control the session, with the most important being Ctrl-b d (first pressing the control and b keys, releasing, and then pressing d ) which detaches from the current session. To reattach to a detached session, run tmux attach from the same host where tmux was initially started. For more information about tmux , please see their Getting Started Guide .","title":"Rsync"},{"location":"data/mccleary-transfer/#globus","text":"Yale provides dedicated Globus connections for each of the clusters. Transfers can be managed through existing accounts on Ruddle using yale#ruddle , or using McCleary's Globus connection ( Yale CRC McCleary ). For a general getting started with Globus, please check out their website . We have a stand-alone docs page about Globus here , but here we will detail the process to transfer data from YSM (for example) to the Gibbs file system. log in to app.globus.org and use your Yale credentials to authenticate. navigate to the File Manager and access Ruddle or McCleary by searching for the \"collection\" yale#ruddle or Yale CRC McCleary in the left-hand panel. find the files you wish to transfer, using the check-boxes to select any and all files needed. click on the \"Transfer or Sync to\" option and in the right-hand panel also search for the same cluster's collection. navigate through the file-browser to find the desired destination for these data (most likely gibbs_project or a subdirectory). start the transfer, click the \"Start\" button on the left-hand side. This will start a background process to transfer all the selected files and directories to their destination. You will receive an email when the transfer completes detailing the size and average speed of the transferred data.","title":"Globus"},{"location":"data/mccleary-transfer/#getting-help","text":"If you run into any issues or if you would like help in setting up your data migration, please feel free to reach out to hpc@yale.edu to request one-on-one support.","title":"Getting help"},{"location":"data/permissions/","text":"Share with Cluster Users Home Directories Do not give your home directory group write permissions. This will break your ability to log into the cluster. If you need to share files currently located in your home directory, either move it your project directory or contact us for assistance finding an appropriate location. project and scratch60 links in Home Directories For convenience, we create a symlink, or shortcut, in every home directory called project and palmer_scratch (and ~/scratch60 on Milgram ) that go to your respective storage spaces . However, if another user attempts to access any data via your symlink, they will receive errors related to permissions for your home space. You can run mydirectories or readlink - f dirname (replace dirname with the one you are interested in) to get the \"true\" paths, which is more readily accesible to other users. Share Data within your Group By default, all project, purchased allocation and scratch directories are readable by other members of your group. As long as they use the true path (not the shortcut your home directory, see above), no permission changes should be needed. If you want to ensure all new files and directories you create have group write permission, add the following line to your ~/.bashrc files: umask 002 Shared Group Directories Upon request we can setup directories for sharing scripts or data across your research group. These directories can either have read-only permissions for the group (so no one accidentally modifies something) or read and write permissions for all group members. If interested, contact us to request such a directory. Share With Specific Users or Other Groups It can be very useful to create shared directories that can be read and written by multiple users, or all members of a group. The linux command setfacl is useful for this, but can be complicated to use. We recommend that you create a shared directory somewhere in your project or scratch directories, rather than home . When sharing a sub-directory in your project or scratch , you need first share your project or scratch , and then share the sub-directory. Here are some simple scenarios. Share a Directory with All Members of a Group To share a new directory called shared in your project directory with group othergroup : setfacl -m g:othergroup:rx $(readlink -f ~/project) cd ~/project mkdir shared setfacl -m g:othergroup:rwX shared setfacl -d -m g:othergroup:rwX shared Share a Directory with a Particular Person To share a new directory called shared with a person with netid aa111 : setfacl -m u:aa111:rx $(readlink -f ~/project) cd ~/project mkdir shared setfacl -m u:aa111:rwX shared setfacl -d -m u:aa111:rwX shared If the shared directory already exists and contains files and directories, you should run the setfacl commands recursively, using -R: setfacl -R -m u:aa111:rwX shared setfacl -R -d -m u:aa111:rwX shared Note that only the owner of a file or directory can run setfacl on it. Remove Sharing of a Directory To remove a group othergroup from sharing of a directory called shared : setfacl -R -x g:othergroup shared To remove a person with netid aa111 from sharing of a directory called shared : setfacl -R -x u:aa111 shared","title":"Share with Cluster Users"},{"location":"data/permissions/#share-with-cluster-users","text":"","title":"Share with Cluster Users"},{"location":"data/permissions/#home-directories","text":"Do not give your home directory group write permissions. This will break your ability to log into the cluster. If you need to share files currently located in your home directory, either move it your project directory or contact us for assistance finding an appropriate location.","title":"Home Directories"},{"location":"data/permissions/#project-and-scratch60-links-in-home-directories","text":"For convenience, we create a symlink, or shortcut, in every home directory called project and palmer_scratch (and ~/scratch60 on Milgram ) that go to your respective storage spaces . However, if another user attempts to access any data via your symlink, they will receive errors related to permissions for your home space. You can run mydirectories or readlink - f dirname (replace dirname with the one you are interested in) to get the \"true\" paths, which is more readily accesible to other users.","title":"project and scratch60 links in Home Directories"},{"location":"data/permissions/#share-data-within-your-group","text":"By default, all project, purchased allocation and scratch directories are readable by other members of your group. As long as they use the true path (not the shortcut your home directory, see above), no permission changes should be needed. If you want to ensure all new files and directories you create have group write permission, add the following line to your ~/.bashrc files: umask 002","title":"Share Data within your Group"},{"location":"data/permissions/#shared-group-directories","text":"Upon request we can setup directories for sharing scripts or data across your research group. These directories can either have read-only permissions for the group (so no one accidentally modifies something) or read and write permissions for all group members. If interested, contact us to request such a directory.","title":"Shared Group Directories"},{"location":"data/permissions/#share-with-specific-users-or-other-groups","text":"It can be very useful to create shared directories that can be read and written by multiple users, or all members of a group. The linux command setfacl is useful for this, but can be complicated to use. We recommend that you create a shared directory somewhere in your project or scratch directories, rather than home . When sharing a sub-directory in your project or scratch , you need first share your project or scratch , and then share the sub-directory. Here are some simple scenarios.","title":"Share With Specific Users or Other Groups"},{"location":"data/permissions/#share-a-directory-with-all-members-of-a-group","text":"To share a new directory called shared in your project directory with group othergroup : setfacl -m g:othergroup:rx $(readlink -f ~/project) cd ~/project mkdir shared setfacl -m g:othergroup:rwX shared setfacl -d -m g:othergroup:rwX shared","title":"Share a Directory with All Members of a Group"},{"location":"data/permissions/#share-a-directory-with-a-particular-person","text":"To share a new directory called shared with a person with netid aa111 : setfacl -m u:aa111:rx $(readlink -f ~/project) cd ~/project mkdir shared setfacl -m u:aa111:rwX shared setfacl -d -m u:aa111:rwX shared If the shared directory already exists and contains files and directories, you should run the setfacl commands recursively, using -R: setfacl -R -m u:aa111:rwX shared setfacl -R -d -m u:aa111:rwX shared Note that only the owner of a file or directory can run setfacl on it.","title":"Share a Directory with a Particular Person"},{"location":"data/permissions/#remove-sharing-of-a-directory","text":"To remove a group othergroup from sharing of a directory called shared : setfacl -R -x g:othergroup shared To remove a person with netid aa111 from sharing of a directory called shared : setfacl -R -x u:aa111 shared","title":"Remove Sharing of a Directory"},{"location":"data/staging/","text":"Stage Data for Compute Jobs Large datasets are often stored off-cluster on departmental servers, Storage@Yale, in cloud storage, etc. Since the permanent home of the data remains on off-cluster storage, you need to transfer a working copy to the cluster temporarily. When your computation finishes, you would then remove the copy and transfer the results to a more permanent location. Temporary Storage We recommend staging data into your scratch storage space on the cluster, as the working copy of the data can then be removed manually or left to be deleted (which will happen automatically after 60-days). Interactive Transfers For interactive transfers, please see our Transfer Data page for a more complete list of ways to move data efficiently to and from the clusters. A sample workflow using rsync would be: # connect to the transfer node from the login node [ netID@cluster ~ ] ssh transfer # copy data to temporary cluster storage [ netID@transfer ~ ] $ rsync -avP netID@department_server:/path/to/data $HOME /palmer_scratch/ # process data on cluster [ netID@transfer ~ ] $ sbatch data_processing.sh # return results to permanent storage for safe-keeping [ netID@transfer ~ ] $ rsync -avP $HOME /palmer_scratch/output_data netID@department_server:/path/to/outputs/ Tip To protect your transfer from network interruptions between your computer and the transfer node, launch your rsync inside a tmux session on the transfer node. Transfer Partition Both Grace and McCleary have dedicated data transfer partitions (named transfer ) designed for staging data onto the cluster. All users are able to submit jobs to these partitions. Note each users is limited to running two transfer jobs at one time. If your workflow requires more simultaneuous transfers, contact us for assistance. Transfers as Batch Jobs A sample sbatch script for an rsync transfer is show here: #!/bin/bash #SBATCH --partition=transfer #SBATCH --time=6:00:00 #SBATCH --cpus-per-task=1 #SBATCH --job-name=my_transfer #SBATCH --output=transfer.txt rsync -av netID@department_server:/path/to/data $HOME /palmer_scratch/ This will launch a batch job that will transfer data from remote.host.yale.edu to your scratch directory. Note, this will only work if you have set up password-less logins on the remote host. Transfer Job Dependencies There are sbatch options that allow you to hold a job from running until a previous job finishes. These are called Job Dependencies, and they allow you to include a data-staging step as part of your data processing pipe-line. Consider a workflow where we would like to process data located on a remote server. We can break this into two separate Slurm jobs: a transfer job followed by a processing job. transfer.sbatch #!/bin/bash #SBATCH --partition=transfer #SBATCH --time=6:00:00 #SBATCH --cpus-per-task=1 #SBATCH --job-name=my_transfer rsync -av netID@department_server:/path/to/data $HOME /palmer_scratch/ process.sbatch #!/bin/bash #SBATCH --partition=day #SBATCH --time=6:00:00 #SBATCH --cpus-per-task=1 #SBATCH --job-name=my_process module purge module load miniconda conda activate my_env python $HOME /process_script.py $HOME /palmer_scratch/data First we would submit the transfer job to Slurm: $ sbatch transfer.sbatch Submitted batch job 12345678 Then we can pass this jobID as a dependency for the processing job: $ sbatch --dependency = afterok:12345678 process.sbatch Submitted batch job 12345679 Slurm will now hold the processing job until the transfer finishes: $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST ( REASON ) 12345679 day process netID PD 0 :00 1 ( Dependency ) 12345678 transfer transfer netID R 0 :15 1 c01n04 Storage@Yale Transfers Storage@Yale shares are mounted on the transfer partition, enabling you to stage data from these remote servers. The process is somewhat simpler than the above example because we do not need to rsync the data, and can instead use cp directly. Here, we have modified the transfer.sbatch file from above: transfer.sbatch #!/bin/bash #SBATCH --partition=transfer #SBATCH --time=6:00:00 #SBATCH --cpus-per-task=1 #SBATCH --job-name=my_transfer cp /SAY/standard/my_say_share/data $HOME /palmer_scratch/ This will transfer data from the Storage@Yale share to palmer_scratch where it can be processed on any of the compute nodes.","title":"Stage Data for Compute Jobs"},{"location":"data/staging/#stage-data-for-compute-jobs","text":"Large datasets are often stored off-cluster on departmental servers, Storage@Yale, in cloud storage, etc. Since the permanent home of the data remains on off-cluster storage, you need to transfer a working copy to the cluster temporarily. When your computation finishes, you would then remove the copy and transfer the results to a more permanent location.","title":"Stage Data for Compute Jobs"},{"location":"data/staging/#temporary-storage","text":"We recommend staging data into your scratch storage space on the cluster, as the working copy of the data can then be removed manually or left to be deleted (which will happen automatically after 60-days).","title":"Temporary Storage"},{"location":"data/staging/#interactive-transfers","text":"For interactive transfers, please see our Transfer Data page for a more complete list of ways to move data efficiently to and from the clusters. A sample workflow using rsync would be: # connect to the transfer node from the login node [ netID@cluster ~ ] ssh transfer # copy data to temporary cluster storage [ netID@transfer ~ ] $ rsync -avP netID@department_server:/path/to/data $HOME /palmer_scratch/ # process data on cluster [ netID@transfer ~ ] $ sbatch data_processing.sh # return results to permanent storage for safe-keeping [ netID@transfer ~ ] $ rsync -avP $HOME /palmer_scratch/output_data netID@department_server:/path/to/outputs/ Tip To protect your transfer from network interruptions between your computer and the transfer node, launch your rsync inside a tmux session on the transfer node.","title":"Interactive Transfers"},{"location":"data/staging/#transfer-partition","text":"Both Grace and McCleary have dedicated data transfer partitions (named transfer ) designed for staging data onto the cluster. All users are able to submit jobs to these partitions. Note each users is limited to running two transfer jobs at one time. If your workflow requires more simultaneuous transfers, contact us for assistance.","title":"Transfer Partition"},{"location":"data/staging/#transfers-as-batch-jobs","text":"A sample sbatch script for an rsync transfer is show here: #!/bin/bash #SBATCH --partition=transfer #SBATCH --time=6:00:00 #SBATCH --cpus-per-task=1 #SBATCH --job-name=my_transfer #SBATCH --output=transfer.txt rsync -av netID@department_server:/path/to/data $HOME /palmer_scratch/ This will launch a batch job that will transfer data from remote.host.yale.edu to your scratch directory. Note, this will only work if you have set up password-less logins on the remote host.","title":"Transfers as Batch Jobs"},{"location":"data/staging/#transfer-job-dependencies","text":"There are sbatch options that allow you to hold a job from running until a previous job finishes. These are called Job Dependencies, and they allow you to include a data-staging step as part of your data processing pipe-line. Consider a workflow where we would like to process data located on a remote server. We can break this into two separate Slurm jobs: a transfer job followed by a processing job.","title":"Transfer Job Dependencies"},{"location":"data/staging/#transfersbatch","text":"#!/bin/bash #SBATCH --partition=transfer #SBATCH --time=6:00:00 #SBATCH --cpus-per-task=1 #SBATCH --job-name=my_transfer rsync -av netID@department_server:/path/to/data $HOME /palmer_scratch/","title":"transfer.sbatch"},{"location":"data/staging/#processsbatch","text":"#!/bin/bash #SBATCH --partition=day #SBATCH --time=6:00:00 #SBATCH --cpus-per-task=1 #SBATCH --job-name=my_process module purge module load miniconda conda activate my_env python $HOME /process_script.py $HOME /palmer_scratch/data First we would submit the transfer job to Slurm: $ sbatch transfer.sbatch Submitted batch job 12345678 Then we can pass this jobID as a dependency for the processing job: $ sbatch --dependency = afterok:12345678 process.sbatch Submitted batch job 12345679 Slurm will now hold the processing job until the transfer finishes: $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST ( REASON ) 12345679 day process netID PD 0 :00 1 ( Dependency ) 12345678 transfer transfer netID R 0 :15 1 c01n04","title":"process.sbatch"},{"location":"data/staging/#storageyale-transfers","text":"Storage@Yale shares are mounted on the transfer partition, enabling you to stage data from these remote servers. The process is somewhat simpler than the above example because we do not need to rsync the data, and can instead use cp directly. Here, we have modified the transfer.sbatch file from above:","title":"Storage@Yale Transfers"},{"location":"data/staging/#transfersbatch_1","text":"#!/bin/bash #SBATCH --partition=transfer #SBATCH --time=6:00:00 #SBATCH --cpus-per-task=1 #SBATCH --job-name=my_transfer cp /SAY/standard/my_say_share/data $HOME /palmer_scratch/ This will transfer data from the Storage@Yale share to palmer_scratch where it can be processed on any of the compute nodes.","title":"transfer.sbatch"},{"location":"data/transfer/","text":"Transfer Data For all transfer methods, you need to have set up your account on the cluster(s) you want to tranfer data to/from. Data Transfer Nodes Each cluster has dedicated nodes specially networked for high speed transfers both on and off-campus using the Yale Science Network. You may use transfer nodes to transfer data from your local machine using one of the below methods. From off-cluster, the nodes are accessible at the following hostnames. You must still be on-campus or on the VPN to access the transfer nodes. Cluster Transfer Node Grace transfer-grace.ycrc.yale.edu McCleary transfer-mccleary.ycrc.yale.edu Milgram transfer-milgram.ycrc.yale.edu From the login node of any cluster, you can ssh into the transfer node. This is useful for transferring data to or from locations other than your local machine (see below for details). [netID@cluster ~] ssh transfer Transferring Data to/from Your Local Machine Graphical Transfer Tools OOD Web Transfers On each cluster, you can use their respective Open OnDemand portals to transfer files. This works best for small numbers of relatively small files. You can also directly edit scripts through this interface, alleviating the need to transfer scripts to your computer to edit. MobaXterm (Windows) MobaXterm is an all-in-one graphical client for Windows that includes a transfer pane for each cluster you connect to. Once you have established a connection to the cluster, click on the \"Sftp\" tab in the left sidebar to see your files on the cluster. You can drag-and-drop data into and out of the SFTP pane to upload and download, respectively. Cyberduck You can also transfer files between your local computer and a cluster using an FTP client, such as Cyberduck (OSX/Windows) . You will need to configure the client with: Your netid as the \"Username\" Cluster transfer node (see above) as the \"Server\" Select your private key as the \"SSH Private Key\" Leave \"Password\" blank (you will be prompted on connection for your ssh key passphrase) An example configuration of Cyberduck is shown below. Cyberduck on McCleary and Milgram McCleary and Milgram require Multi-Factor Authentication so there are a couple additional configuration steps. Under Cyberduck > Preferences > Transfers > General change the setting to \"Use browser connection\" instead of \"Open multiple connections\". When you connect type one of the following when prompted with a \"Partial authentication success\" window. \"push\" to receive a push notification to your smart phone (requires the Duo mobile app) \"sms\" to receive a verification passcode via text message \"phone\" to receive a phone call Large File Transfers (Globus) You can use the Globus service to perform larger data transfers between your local machine and the clusters. Globus provides a robust and resumable way to transfer larger files or datasets. Please see our Globus page for Yale-specific documentation and their official docs to get started. Command-Line Transfer Tools scp and rsync (macOS/Linux/Linux on Windows) Linux and macOS users can use scp or rsync . Use the hostname of the cluster transfer node (see above) to transfer files. These transfers must be initiated from your local machine. scp and sftp are both used from a Terminal window. The basic syntax of scp is scp [ from ] [ to ] The from and to can each be a filename or a directory/folder on the computer you are typing the command on or a remote host (e.g. the transfer node). Example: Transfer a File from Your Computer to a Cluster Using the example netid abc123 , following is run on your computer's local terminal. scp myfile.txt abc123@transfer-grace.ycrc.yale.edu:/home/fas/admins/abc123/test In this example, myfile.txt is copied to the directory /home/fas/admins/abc123/test: on Grace. This example assumes that myfile.txt is in your current directory. You may also specify the full path of myfile.txt . scp /home/xyz/myfile.txt abc123@transfer-grace.ycrc.yale.edu:/home/fas/admins/abc123/test Example: Transfer a Directory to a Cluster scp -r mydirectory abc123@transfer-grace.ycrc.yale.edu:/home/fas/admins/abc123/test In this example, the contents of mydirectory are transferred. The -r indicates that the copy is recursive. Example: Transfer Files from the Cluster to Your Computer Assuming you would like the files copied to your current directory: scp abc123@transfer-grace.ycrc.yale.edu:/home/fas/admins/abc123/myfile.txt . Note that . represents your current working directory. To specify the destination, simply replace the . with the full path: scp abc123@transfer-grace.ycrc.yale.edu:/home/fas/admins/abc123/myfile.txt /path/myfolder Transfer Data to/from Other Locations Globus Endpoints Globus is a web-enabled GridFTP service that transfers large datasets fast, securely, and reliably between computers configured to be endpoints. Please see our Globus page for Yale-specific documentation and their official docs to get started. We have configured endpoints for most of the Yale clusters and many other institutions and compute facilities have Globus endpoints. You can also use Globus to transfer data to/from Eliapps Google Drive and S3 buckets. Cluster Transfer Nodes You can use the cluster transfer nodes to download/upload data to locations off-cluster. For data that is primarily hosted elsewhere and is only needed on the cluster temporarily, see our guide on Staging Data for additional information. For any data that hosted outside of Yale, you will need to initiate the transfer from the cluster's data transfer node as the clusters are not accessible without the VPN. On Milgram, which does not have a transfer node, you can initiate the transfers from a login node. However, please be mindful of that other users will also be using the login nodes for regular cluster operations. Tip If you are running a large transfer without Globus , run it inside a tmux session on the transfer node. This protects your transfer from network interruptions between your computer and the transfer node. rsync # connect to the transfer node from the login node [netID@cluster ~] ssh transfer # copy data to cluster storage [netID@transfer ~]$ rsync -avP netID@department_server:/path/to/data $HOME/scratch60/ Rclone To move data to and from cloud storage (Box, Dropbox, Wasabi, AWS S3, or Google Cloud Storage, etc.), we recommend using Rclone . It is installed on all of the clusters and can be installed on your computer. You will need to configure it for each kind of storage you would like to transfer to with: rclone configure You'll be prompted for a name for the connection (e.g mys3), and then details about the connection. Once you've saved that configuration, you can connect to the transfer node (using ssh transfer from the login node) and then use that connection name to copy files with similar syntax to scp and rsync : rclone copy localpath/myfile mys3:bucketname/ rclone sync localpath/mydir mys3:bucketname/remotedir We recommend that you protect your configurations with a password. You'll see that as an option when you run rclone config. Please see our Rclone page for additional information on how to set up and use Rclone on the YCRC clusters. For all the Rclone documentaion please refer to the official site . Sites Behind a VPN If you need to transfer data to or from an external site that is only accessible via VPN, please contact us for assistance as we might be able to provide a workaround to enable a direct transfer between the YCRC clusters and your external site.","title":"Transfer to Cluster"},{"location":"data/transfer/#transfer-data","text":"For all transfer methods, you need to have set up your account on the cluster(s) you want to tranfer data to/from.","title":"Transfer Data"},{"location":"data/transfer/#data-transfer-nodes","text":"Each cluster has dedicated nodes specially networked for high speed transfers both on and off-campus using the Yale Science Network. You may use transfer nodes to transfer data from your local machine using one of the below methods. From off-cluster, the nodes are accessible at the following hostnames. You must still be on-campus or on the VPN to access the transfer nodes. Cluster Transfer Node Grace transfer-grace.ycrc.yale.edu McCleary transfer-mccleary.ycrc.yale.edu Milgram transfer-milgram.ycrc.yale.edu From the login node of any cluster, you can ssh into the transfer node. This is useful for transferring data to or from locations other than your local machine (see below for details). [netID@cluster ~] ssh transfer","title":"Data Transfer Nodes"},{"location":"data/transfer/#transferring-data-tofrom-your-local-machine","text":"","title":"Transferring Data to/from Your Local Machine"},{"location":"data/transfer/#graphical-transfer-tools","text":"","title":"Graphical Transfer Tools"},{"location":"data/transfer/#ood-web-transfers","text":"On each cluster, you can use their respective Open OnDemand portals to transfer files. This works best for small numbers of relatively small files. You can also directly edit scripts through this interface, alleviating the need to transfer scripts to your computer to edit.","title":"OOD Web Transfers"},{"location":"data/transfer/#mobaxterm-windows","text":"MobaXterm is an all-in-one graphical client for Windows that includes a transfer pane for each cluster you connect to. Once you have established a connection to the cluster, click on the \"Sftp\" tab in the left sidebar to see your files on the cluster. You can drag-and-drop data into and out of the SFTP pane to upload and download, respectively.","title":"MobaXterm (Windows)"},{"location":"data/transfer/#cyberduck","text":"You can also transfer files between your local computer and a cluster using an FTP client, such as Cyberduck (OSX/Windows) . You will need to configure the client with: Your netid as the \"Username\" Cluster transfer node (see above) as the \"Server\" Select your private key as the \"SSH Private Key\" Leave \"Password\" blank (you will be prompted on connection for your ssh key passphrase) An example configuration of Cyberduck is shown below.","title":"Cyberduck"},{"location":"data/transfer/#cyberduck-on-mccleary-and-milgram","text":"McCleary and Milgram require Multi-Factor Authentication so there are a couple additional configuration steps. Under Cyberduck > Preferences > Transfers > General change the setting to \"Use browser connection\" instead of \"Open multiple connections\". When you connect type one of the following when prompted with a \"Partial authentication success\" window. \"push\" to receive a push notification to your smart phone (requires the Duo mobile app) \"sms\" to receive a verification passcode via text message \"phone\" to receive a phone call","title":"Cyberduck on McCleary and Milgram"},{"location":"data/transfer/#large-file-transfers-globus","text":"You can use the Globus service to perform larger data transfers between your local machine and the clusters. Globus provides a robust and resumable way to transfer larger files or datasets. Please see our Globus page for Yale-specific documentation and their official docs to get started.","title":"Large File Transfers (Globus)"},{"location":"data/transfer/#command-line-transfer-tools","text":"","title":"Command-Line Transfer Tools"},{"location":"data/transfer/#scp-and-rsync-macoslinuxlinux-on-windows","text":"Linux and macOS users can use scp or rsync . Use the hostname of the cluster transfer node (see above) to transfer files. These transfers must be initiated from your local machine. scp and sftp are both used from a Terminal window. The basic syntax of scp is scp [ from ] [ to ] The from and to can each be a filename or a directory/folder on the computer you are typing the command on or a remote host (e.g. the transfer node).","title":"scp and rsync (macOS/Linux/Linux on Windows)"},{"location":"data/transfer/#example-transfer-a-file-from-your-computer-to-a-cluster","text":"Using the example netid abc123 , following is run on your computer's local terminal. scp myfile.txt abc123@transfer-grace.ycrc.yale.edu:/home/fas/admins/abc123/test In this example, myfile.txt is copied to the directory /home/fas/admins/abc123/test: on Grace. This example assumes that myfile.txt is in your current directory. You may also specify the full path of myfile.txt . scp /home/xyz/myfile.txt abc123@transfer-grace.ycrc.yale.edu:/home/fas/admins/abc123/test","title":"Example: Transfer a File from Your Computer to a Cluster"},{"location":"data/transfer/#example-transfer-a-directory-to-a-cluster","text":"scp -r mydirectory abc123@transfer-grace.ycrc.yale.edu:/home/fas/admins/abc123/test In this example, the contents of mydirectory are transferred. The -r indicates that the copy is recursive.","title":"Example: Transfer a Directory to a Cluster"},{"location":"data/transfer/#example-transfer-files-from-the-cluster-to-your-computer","text":"Assuming you would like the files copied to your current directory: scp abc123@transfer-grace.ycrc.yale.edu:/home/fas/admins/abc123/myfile.txt . Note that . represents your current working directory. To specify the destination, simply replace the . with the full path: scp abc123@transfer-grace.ycrc.yale.edu:/home/fas/admins/abc123/myfile.txt /path/myfolder","title":"Example: Transfer Files from the Cluster to Your Computer"},{"location":"data/transfer/#transfer-data-tofrom-other-locations","text":"","title":"Transfer Data to/from Other Locations"},{"location":"data/transfer/#globus-endpoints","text":"Globus is a web-enabled GridFTP service that transfers large datasets fast, securely, and reliably between computers configured to be endpoints. Please see our Globus page for Yale-specific documentation and their official docs to get started. We have configured endpoints for most of the Yale clusters and many other institutions and compute facilities have Globus endpoints. You can also use Globus to transfer data to/from Eliapps Google Drive and S3 buckets.","title":"Globus Endpoints"},{"location":"data/transfer/#cluster-transfer-nodes","text":"You can use the cluster transfer nodes to download/upload data to locations off-cluster. For data that is primarily hosted elsewhere and is only needed on the cluster temporarily, see our guide on Staging Data for additional information. For any data that hosted outside of Yale, you will need to initiate the transfer from the cluster's data transfer node as the clusters are not accessible without the VPN. On Milgram, which does not have a transfer node, you can initiate the transfers from a login node. However, please be mindful of that other users will also be using the login nodes for regular cluster operations. Tip If you are running a large transfer without Globus , run it inside a tmux session on the transfer node. This protects your transfer from network interruptions between your computer and the transfer node.","title":"Cluster Transfer Nodes"},{"location":"data/transfer/#rsync","text":"# connect to the transfer node from the login node [netID@cluster ~] ssh transfer # copy data to cluster storage [netID@transfer ~]$ rsync -avP netID@department_server:/path/to/data $HOME/scratch60/","title":"rsync"},{"location":"data/transfer/#rclone","text":"To move data to and from cloud storage (Box, Dropbox, Wasabi, AWS S3, or Google Cloud Storage, etc.), we recommend using Rclone . It is installed on all of the clusters and can be installed on your computer. You will need to configure it for each kind of storage you would like to transfer to with: rclone configure You'll be prompted for a name for the connection (e.g mys3), and then details about the connection. Once you've saved that configuration, you can connect to the transfer node (using ssh transfer from the login node) and then use that connection name to copy files with similar syntax to scp and rsync : rclone copy localpath/myfile mys3:bucketname/ rclone sync localpath/mydir mys3:bucketname/remotedir We recommend that you protect your configurations with a password. You'll see that as an option when you run rclone config. Please see our Rclone page for additional information on how to set up and use Rclone on the YCRC clusters. For all the Rclone documentaion please refer to the official site .","title":"Rclone"},{"location":"data/transfer/#sites-behind-a-vpn","text":"If you need to transfer data to or from an external site that is only accessible via VPN, please contact us for assistance as we might be able to provide a workaround to enable a direct transfer between the YCRC clusters and your external site.","title":"Sites Behind a VPN"},{"location":"data/ycga-data/","text":"YCGA Data Data associated with YCGA projects and sequencers are located on the YCGA storage system, accessible at /gpfs/ycga/sequencers on McCleary . YCGA Access Retention Policy The McCleary high-performance computing system has specific resources that are dedicated to YCGA users. This includes a slurm partition (\u2018ycga\u2019) and a large parallel storage system (/gpfs/ycga). The following policy guidelines govern the use of these resources on McCleary for data storage and analysis. Yale University Faculty User All Yale PIs using YCGA for library preparation and/or sequencing will have an additional 5 TB storage area called \u2018work\u2019 for data storage. This is in addition to the 5 TB storage area called \u2018project\u2019 that all McCleary groups receive. Currently, neither work or project storage is backed up. Users are responsible for protecting their own data. All Fastq files are available on the /gpfs/ycga storage system for one year. After that, the files are available in an archive that allows self-service retrieval, as described in the link above. Issues or questions about archived data can be addressed to ycga@yale.edu. Users processing sequence data on McCleary should be careful to submit their jobs to the \u2018ycga\u2019 partition. Jobs submitted to other partitions may incur additional charges. Members of Yale PI labs using YCGA for library preparation and/or sequencing may apply for accounts on McCleary with PI\u2019s approval. Each Yale PI lab will have a dedicated secure directory to store their data, and permission to lab members will be granted with the authorization of the respective PI. Furthermore, such approval will be terminated upon request from the PI or termination of Yale Net ID. Lab members moving to a new university will get access to HPC resources for an additional six months only upon permission from Yale PI. If Yale NetID is no longer accessible, former Yale members who were YCGA users should request a Sponsored Identity NetID from their business office. Sponsored Identity NetIDs will be valid for six months. Such users will also need to request VPN access. A PI moving to a new university to establish their lab will have access to their data for one year from the termination of their Yale position. During this time, the PI or one lab member from the new lab will be provided access to the HPC system. Request for Guest NetID should be made to their business office. Guest NetID will be valid for one year. Any new Yale faculty member will be given access to McCleary once they start using YCGA services. Users not utilizing the YCGA services will not be provided access to McCleary high- performance computing system. External Collaborators Access to McCleary can be granted to collaborating labs, with the authorization of the respective Yale PI. A maximum of one account per collaborating lab will be granted. Furthermore, such approval will be terminated upon request from the PI. Request for a Sponsored Identity NetID should be made to the Yale PI\u2019s business office. Guest NetID will be valid for one year. The expectation is that the collaborator, with PI consent, will download data from the McCleary HPC system to their own internal system for data analysis. Non-Yale Users Users not affiliated with Yale University will not be provided access to McCleary high- performance computing system. YCGA Data Retention Policy Illumina sequence data is initially written to YCGA's main storage system, which is located in the main HPC datacenter at Yale's West Campus. Data stored there is protected against loss by software RAID. Raw basecall data (bcl files) is immediately transformed into DNA sequences (fastq files). ~45 days after sequencing, the raw bcl files are deleted. ~60 days after sequencing, the fastq files are written to an archive. This archive exists in two geographically distinct copies for safety. ~365 days after sequencing, all data is deleted from main storage. Users continue to have access to the data via the archive. Data is retained on the archive indefinitely. See below for instructions for retrieving archived data. All compression of sequence data is lossless. Gzip is used for data stored on the main storage, and quip is used for data stored on the archive. Disaster recovery is provided by the archive copy. YCGA will send you an email informing you that your data is ready, and will include a url that looks like: http://fcb.ycga.yale.edu:3010/ randomstring /sample_dir_001 You can use that link to download your data in a browser, but if you plan to process the data on McCleary, it is better to make a soft link to the data, rather than copying it. To find the actual location of your data, do: $ readlink -f /ycga-gpfs/project/fas/lsprog/tools/external/data/randomstring/sample_dir_001 Illumina sequencing data For Illumina data (not singlecell or pacbio data), you can browse to the YCGA-provided URL and find a file ruddle_paths.txt that contains the true locations of the files. Alternatively, you can use the ycgaFastq tool to easily make soft links to the sequencing files: export PATH = $PATH :/gpfs/gibbs/pi/ycga/mane/ycga_bioinfo/bin_May2023 $ ycgaFastq fcb.ycga.yale.edu:3010/randomstring/sample_dir_001 ycgaFastq can also be used to retrieve data that has been archived. The simplest way to do that is to provide the sample submitter's netid and the flowcell (run) name: $ ycgaFastq rdb9 AHFH66DSXX If you have a path to the original location of the sequencing data, ycgaFastq can retrieve the data using that, even if the run has been archived and deleted: $ ycgaFastq /ycga-gpfs/sequencers/illumina/sequencerD/runs/190607_A00124_0104_AHLF3MMSXX/Data/Intensities/BaseCalls/Unaligned-2/Project_Lz438 If you have a manifest file that contains the paths to all of the data files in a dataset, you can use ycgaFastq as well: $ ycgaFastq manifest.txt ycgaFastq can be used in a variety of other ways to retrieve data. For more information, see the documentation or contact us. Tip Original sequence data are archived pursuant to the YCGA retention policy. For long-running projects we recommend you keep a personal backup of your sequence files. If you need to retrieve archived sequencing data, please see our below . Retrieve Data from the Archive Info The sequence archive /SAY/archive/YCGa-72009-YCGA-A2 is only mounted on the transfer node and transfer partition. You must ssh to transfer, or submit a job (batch or interactive) to the transfer partition, in order to access and download archived sequence data. In the sequencing data archive, a directory exists for each run, holding one or more tar files. There is a main tar file, plus a tar file for each project directory. Most users only need the project tar file corresponding to their data. Although the archive actually exists in cloud storage, you can treat it as a regular directory tree. Many operations such as ls , cd , etc. are very fast, since directory structures and file metadata are on a disk cache. However, when you actually read the contents of files the file is retrieved and read into a disk cache. This can take some time. Archived runs are stored in the following locations. Original location Archive location /panfs/ /SAY/archive/YCGA-729009-YCGA-A2/archive/panfs/ /ycga-ba/ /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-ba/ /gpfs/ycga/sequencers/illumina/ /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/ /gpfs/gibbs/pi/ycga/pacbio/ /SAY/archive/YCGA-729009-YCGA-A2/archive/pacbio/ You can directly copy or untar the project tarfile into a scratch directory. Info Very large tar files over 500GB, sometimes fail to download. If you run into problems, contact us at hpc@yale.edu and we can manually download it. cd ~/palmer_scratch/somedir tar \u2013xvf /SAY/archive/YCGA-729009-YCGA-A2/archive/path/to/file.tar Inside the project tar files are the fastq files, which have been compressed using quip . If your pipeline cannot read quip files directly, you will need to uncompress them before using them. module load Quip quip \u2013d M20_ACAGTG_L008_R1_009.fastq.qp If you have trouble locating your files, you can use the utility locateRun , using any substring of the original run name. locateRun is in the ycga-public module. module load ycga-public locateRun C9374AN Tip When retrieving data, run untar/unquip as a job on a compute node, not a login node and make sure to allocate sufficient resources to your job, e.g. \u2013c 20 --mem=100G . Tip The ycgaFastq tool can also be used to recover archived data. See above . Example Imagine that user rdb9 wants to restore data from run BHJWZZBCX3 step 1 Get session on transfer partition salloc -p transfer module load ycga-public step 2 Find the run location $ locateRun BHJWZZBCX3 /ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3.deleted /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3 Note that the original run location has been deleted, but the archive location is listed. step 3 List the contents of the archived run, and locate the desired project tarball: $ ls -1 /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3 210305_D00306_1337_BHJWZZBCX3_0.tar 210305_D00306_1337_BHJWZZBCX3_0_Unaligned_Project_Jdm222.tar 210305_D00306_1337_BHJWZZBCX3_1_Unaligned-1_Project_Rdb9.tar 210305_D00306_1337_BHJWZZBCX3_2021_05_09_04:00:36_archive.log We want 210305_D00306_1337_BHJWZZBCX3_1_Unaligned-1_Project_Rdb9.tar , matching our netid. step 4 First, copy the tarball to scratch. To do this you must be on the transfer partition or transfer node, since /SAY is only mounted there. cd ~/palmer_scratch rsync -L -v /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3/210305_D00306_1337_BHJWZZBCX3_1_Unaligned-1_Project_Rdb9.tar . step 5 Submit a batch job to use the restore utility to uncompress the fastq files from the tar file. In our example we'll use 32 cpus. This is not done using the transfer partition, but rather the day partition, since day will allow you more cpus. The restore will likely take several minutes. To see progress, you can use the -v flag. Put the following code in a batch script (e.g. myrestore.sh): #/bin/bash #SBATCH -c 32 #SBATCH -p day module load ycga-public restore -v -n $SLURM_CPUS_PER_TASK -t 210305_D00306_1337_BHJWKHBCX3_1_Unaligned-1_Project_Rdb9.tar Then submit the job using sbatch: sbatch myrestore.sh The restored fastq files will written to a directory like this: 210305_D00306_1337_BHJWZZBCX3/Data/Intensities/BaseCalls/Unaligned*/Project_*","title":"YCGA Data"},{"location":"data/ycga-data/#ycga-data","text":"Data associated with YCGA projects and sequencers are located on the YCGA storage system, accessible at /gpfs/ycga/sequencers on McCleary .","title":"YCGA Data"},{"location":"data/ycga-data/#ycga-access-retention-policy","text":"The McCleary high-performance computing system has specific resources that are dedicated to YCGA users. This includes a slurm partition (\u2018ycga\u2019) and a large parallel storage system (/gpfs/ycga). The following policy guidelines govern the use of these resources on McCleary for data storage and analysis.","title":"YCGA Access Retention Policy"},{"location":"data/ycga-data/#yale-university-faculty-user","text":"All Yale PIs using YCGA for library preparation and/or sequencing will have an additional 5 TB storage area called \u2018work\u2019 for data storage. This is in addition to the 5 TB storage area called \u2018project\u2019 that all McCleary groups receive. Currently, neither work or project storage is backed up. Users are responsible for protecting their own data. All Fastq files are available on the /gpfs/ycga storage system for one year. After that, the files are available in an archive that allows self-service retrieval, as described in the link above. Issues or questions about archived data can be addressed to ycga@yale.edu. Users processing sequence data on McCleary should be careful to submit their jobs to the \u2018ycga\u2019 partition. Jobs submitted to other partitions may incur additional charges. Members of Yale PI labs using YCGA for library preparation and/or sequencing may apply for accounts on McCleary with PI\u2019s approval. Each Yale PI lab will have a dedicated secure directory to store their data, and permission to lab members will be granted with the authorization of the respective PI. Furthermore, such approval will be terminated upon request from the PI or termination of Yale Net ID. Lab members moving to a new university will get access to HPC resources for an additional six months only upon permission from Yale PI. If Yale NetID is no longer accessible, former Yale members who were YCGA users should request a Sponsored Identity NetID from their business office. Sponsored Identity NetIDs will be valid for six months. Such users will also need to request VPN access. A PI moving to a new university to establish their lab will have access to their data for one year from the termination of their Yale position. During this time, the PI or one lab member from the new lab will be provided access to the HPC system. Request for Guest NetID should be made to their business office. Guest NetID will be valid for one year. Any new Yale faculty member will be given access to McCleary once they start using YCGA services. Users not utilizing the YCGA services will not be provided access to McCleary high- performance computing system.","title":"Yale University Faculty User"},{"location":"data/ycga-data/#external-collaborators","text":"Access to McCleary can be granted to collaborating labs, with the authorization of the respective Yale PI. A maximum of one account per collaborating lab will be granted. Furthermore, such approval will be terminated upon request from the PI. Request for a Sponsored Identity NetID should be made to the Yale PI\u2019s business office. Guest NetID will be valid for one year. The expectation is that the collaborator, with PI consent, will download data from the McCleary HPC system to their own internal system for data analysis.","title":"External Collaborators"},{"location":"data/ycga-data/#non-yale-users","text":"Users not affiliated with Yale University will not be provided access to McCleary high- performance computing system.","title":"Non-Yale Users"},{"location":"data/ycga-data/#ycga-data-retention-policy","text":"Illumina sequence data is initially written to YCGA's main storage system, which is located in the main HPC datacenter at Yale's West Campus. Data stored there is protected against loss by software RAID. Raw basecall data (bcl files) is immediately transformed into DNA sequences (fastq files). ~45 days after sequencing, the raw bcl files are deleted. ~60 days after sequencing, the fastq files are written to an archive. This archive exists in two geographically distinct copies for safety. ~365 days after sequencing, all data is deleted from main storage. Users continue to have access to the data via the archive. Data is retained on the archive indefinitely. See below for instructions for retrieving archived data. All compression of sequence data is lossless. Gzip is used for data stored on the main storage, and quip is used for data stored on the archive. Disaster recovery is provided by the archive copy. YCGA will send you an email informing you that your data is ready, and will include a url that looks like: http://fcb.ycga.yale.edu:3010/ randomstring /sample_dir_001 You can use that link to download your data in a browser, but if you plan to process the data on McCleary, it is better to make a soft link to the data, rather than copying it. To find the actual location of your data, do: $ readlink -f /ycga-gpfs/project/fas/lsprog/tools/external/data/randomstring/sample_dir_001","title":"YCGA Data Retention Policy"},{"location":"data/ycga-data/#illumina-sequencing-data","text":"For Illumina data (not singlecell or pacbio data), you can browse to the YCGA-provided URL and find a file ruddle_paths.txt that contains the true locations of the files. Alternatively, you can use the ycgaFastq tool to easily make soft links to the sequencing files: export PATH = $PATH :/gpfs/gibbs/pi/ycga/mane/ycga_bioinfo/bin_May2023 $ ycgaFastq fcb.ycga.yale.edu:3010/randomstring/sample_dir_001 ycgaFastq can also be used to retrieve data that has been archived. The simplest way to do that is to provide the sample submitter's netid and the flowcell (run) name: $ ycgaFastq rdb9 AHFH66DSXX If you have a path to the original location of the sequencing data, ycgaFastq can retrieve the data using that, even if the run has been archived and deleted: $ ycgaFastq /ycga-gpfs/sequencers/illumina/sequencerD/runs/190607_A00124_0104_AHLF3MMSXX/Data/Intensities/BaseCalls/Unaligned-2/Project_Lz438 If you have a manifest file that contains the paths to all of the data files in a dataset, you can use ycgaFastq as well: $ ycgaFastq manifest.txt ycgaFastq can be used in a variety of other ways to retrieve data. For more information, see the documentation or contact us. Tip Original sequence data are archived pursuant to the YCGA retention policy. For long-running projects we recommend you keep a personal backup of your sequence files. If you need to retrieve archived sequencing data, please see our below .","title":"Illumina sequencing data"},{"location":"data/ycga-data/#retrieve-data-from-the-archive","text":"Info The sequence archive /SAY/archive/YCGa-72009-YCGA-A2 is only mounted on the transfer node and transfer partition. You must ssh to transfer, or submit a job (batch or interactive) to the transfer partition, in order to access and download archived sequence data. In the sequencing data archive, a directory exists for each run, holding one or more tar files. There is a main tar file, plus a tar file for each project directory. Most users only need the project tar file corresponding to their data. Although the archive actually exists in cloud storage, you can treat it as a regular directory tree. Many operations such as ls , cd , etc. are very fast, since directory structures and file metadata are on a disk cache. However, when you actually read the contents of files the file is retrieved and read into a disk cache. This can take some time. Archived runs are stored in the following locations. Original location Archive location /panfs/ /SAY/archive/YCGA-729009-YCGA-A2/archive/panfs/ /ycga-ba/ /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-ba/ /gpfs/ycga/sequencers/illumina/ /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/ /gpfs/gibbs/pi/ycga/pacbio/ /SAY/archive/YCGA-729009-YCGA-A2/archive/pacbio/ You can directly copy or untar the project tarfile into a scratch directory. Info Very large tar files over 500GB, sometimes fail to download. If you run into problems, contact us at hpc@yale.edu and we can manually download it. cd ~/palmer_scratch/somedir tar \u2013xvf /SAY/archive/YCGA-729009-YCGA-A2/archive/path/to/file.tar Inside the project tar files are the fastq files, which have been compressed using quip . If your pipeline cannot read quip files directly, you will need to uncompress them before using them. module load Quip quip \u2013d M20_ACAGTG_L008_R1_009.fastq.qp If you have trouble locating your files, you can use the utility locateRun , using any substring of the original run name. locateRun is in the ycga-public module. module load ycga-public locateRun C9374AN Tip When retrieving data, run untar/unquip as a job on a compute node, not a login node and make sure to allocate sufficient resources to your job, e.g. \u2013c 20 --mem=100G . Tip The ycgaFastq tool can also be used to recover archived data. See above .","title":"Retrieve Data from the Archive"},{"location":"data/ycga-data/#example","text":"Imagine that user rdb9 wants to restore data from run BHJWZZBCX3 step 1 Get session on transfer partition salloc -p transfer module load ycga-public step 2 Find the run location $ locateRun BHJWZZBCX3 /ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3.deleted /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3 Note that the original run location has been deleted, but the archive location is listed. step 3 List the contents of the archived run, and locate the desired project tarball: $ ls -1 /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3 210305_D00306_1337_BHJWZZBCX3_0.tar 210305_D00306_1337_BHJWZZBCX3_0_Unaligned_Project_Jdm222.tar 210305_D00306_1337_BHJWZZBCX3_1_Unaligned-1_Project_Rdb9.tar 210305_D00306_1337_BHJWZZBCX3_2021_05_09_04:00:36_archive.log We want 210305_D00306_1337_BHJWZZBCX3_1_Unaligned-1_Project_Rdb9.tar , matching our netid. step 4 First, copy the tarball to scratch. To do this you must be on the transfer partition or transfer node, since /SAY is only mounted there. cd ~/palmer_scratch rsync -L -v /SAY/archive/YCGA-729009-YCGA-A2/archive/ycga-gpfs/sequencers/illumina/sequencerV/runs/210305_D00306_1337_BHJWZZBCX3/210305_D00306_1337_BHJWZZBCX3_1_Unaligned-1_Project_Rdb9.tar . step 5 Submit a batch job to use the restore utility to uncompress the fastq files from the tar file. In our example we'll use 32 cpus. This is not done using the transfer partition, but rather the day partition, since day will allow you more cpus. The restore will likely take several minutes. To see progress, you can use the -v flag. Put the following code in a batch script (e.g. myrestore.sh): #/bin/bash #SBATCH -c 32 #SBATCH -p day module load ycga-public restore -v -n $SLURM_CPUS_PER_TASK -t 210305_D00306_1337_BHJWKHBCX3_1_Unaligned-1_Project_Rdb9.tar Then submit the job using sbatch: sbatch myrestore.sh The restored fastq files will written to a directory like this: 210305_D00306_1337_BHJWZZBCX3/Data/Intensities/BaseCalls/Unaligned*/Project_*","title":"Example"},{"location":"news/2022-02-grace/","text":"Grace Maintenance February 3-6, 2022 Software Updates Latest security patches applied Slurm updated to version 21.08.5 NVIDIA driver updated to version 510.39.01 (except for nodes with K80 GPUs which are stranded at 470.82.01) Singularity updated to version 3.8.5 Open OnDemand updated to version 2.0.20 Hardware Updates Changes have been made to networking to improve performance of certain older compute nodes Changes to Grace Home Directories During the maintenance, all home directories on Grace have been moved to our new all-flash storage filesystem, Palmer. The move is in anticipation of the decommissioning of Loomis at the end of the year and will provide a robust login experience by protecting home directory interactions from data intensive compute jobs. Due to this migration, your home directory path has changed from /gpfs/loomis/home.grace/ to /vast/palmer/home.grace/ . Your home directory can always be referenced in bash and submission scripts and from the command line with the $HOME variable. Please update any scripts and workflows accordingly. Interactive Jobs We have added an additional way to request an interactive job. The Slurm command salloc can be used to start an interactive job similar to srun --pty bash . In addition to being a simpler command (no --pty bash is needed), salloc jobs can be used to interactively test mpirun executables. Palmer scratch Palmer is out of beta! We have fixed the issue with Plink on Palmer, so now you can use Palmer scratch for any workloads. See https://docs.ycrc.yale.edu/data/hpc-storage#60-day-scratch for more information on Palmer scratch.","title":"2022 02 grace"},{"location":"news/2022-02-grace/#grace-maintenance","text":"February 3-6, 2022","title":"Grace Maintenance"},{"location":"news/2022-02-grace/#software-updates","text":"Latest security patches applied Slurm updated to version 21.08.5 NVIDIA driver updated to version 510.39.01 (except for nodes with K80 GPUs which are stranded at 470.82.01) Singularity updated to version 3.8.5 Open OnDemand updated to version 2.0.20","title":"Software Updates"},{"location":"news/2022-02-grace/#hardware-updates","text":"Changes have been made to networking to improve performance of certain older compute nodes","title":"Hardware Updates"},{"location":"news/2022-02-grace/#changes-to-grace-home-directories","text":"During the maintenance, all home directories on Grace have been moved to our new all-flash storage filesystem, Palmer. The move is in anticipation of the decommissioning of Loomis at the end of the year and will provide a robust login experience by protecting home directory interactions from data intensive compute jobs. Due to this migration, your home directory path has changed from /gpfs/loomis/home.grace/ to /vast/palmer/home.grace/ . Your home directory can always be referenced in bash and submission scripts and from the command line with the $HOME variable. Please update any scripts and workflows accordingly.","title":"Changes to Grace Home Directories"},{"location":"news/2022-02-grace/#interactive-jobs","text":"We have added an additional way to request an interactive job. The Slurm command salloc can be used to start an interactive job similar to srun --pty bash . In addition to being a simpler command (no --pty bash is needed), salloc jobs can be used to interactively test mpirun executables.","title":"Interactive Jobs"},{"location":"news/2022-02-grace/#palmer-scratch","text":"Palmer is out of beta! We have fixed the issue with Plink on Palmer, so now you can use Palmer scratch for any workloads. See https://docs.ycrc.yale.edu/data/hpc-storage#60-day-scratch for more information on Palmer scratch.","title":"Palmer scratch"},{"location":"news/2022-02/","text":"February 2022 Announcements Grace Maintenance The biannual scheduled maintenance for the Grace cluster will be occurring February 1-3. During this time, the cluster will be unavailable. See the Grace maintenance email announcement for more details. Data Transfers For non-Milgram users doing data transfers, transfers should not be performed on the login nodes. We have a few alternative ways to get better networking and reduce the impact on the clusters\u2019 login nodes: Dedicated transfer node . Each cluster has a dedicated transfer node, transfer-.hpc.yale.edu . You can ssh directly to this node and run commands. \u201ctransfer\u201d Slurm partition . This is a small partition managed by the scheduler for doing data transfer. You can submit jobs to it using srun/sbatch -p transfer \u2026 *For recurring or periodic data transfers (such as using cron), please use Slurm\u2019s scrontab to schedule jobs that run on the transfer partition instead. Globus . For robust transfers of larger amount of data, see our Globus documentation. More info about data transfers can be found in our Data Transfer documentation. Software Highlights Rclone is now installed on all nodes and loading the module is no longer necessary. MATLAB/2021b is now on all clusters. Julia/1.7.1-linux-x86_64 is now on all clusters. Mathematica/13.0.0 is now on Grace. QuantumESPRESSO/6.8-intel-2020b and QuantumESPRESSO/7.0-intel-2020b are now on Grace. Mathematica documentation has been updated with regards to configuring parallel jobs.","title":"2022 02"},{"location":"news/2022-02/#february-2022","text":"","title":"February 2022"},{"location":"news/2022-02/#announcements","text":"","title":"Announcements"},{"location":"news/2022-02/#grace-maintenance","text":"The biannual scheduled maintenance for the Grace cluster will be occurring February 1-3. During this time, the cluster will be unavailable. See the Grace maintenance email announcement for more details.","title":"Grace Maintenance"},{"location":"news/2022-02/#data-transfers","text":"For non-Milgram users doing data transfers, transfers should not be performed on the login nodes. We have a few alternative ways to get better networking and reduce the impact on the clusters\u2019 login nodes: Dedicated transfer node . Each cluster has a dedicated transfer node, transfer-.hpc.yale.edu . You can ssh directly to this node and run commands. \u201ctransfer\u201d Slurm partition . This is a small partition managed by the scheduler for doing data transfer. You can submit jobs to it using srun/sbatch -p transfer \u2026 *For recurring or periodic data transfers (such as using cron), please use Slurm\u2019s scrontab to schedule jobs that run on the transfer partition instead. Globus . For robust transfers of larger amount of data, see our Globus documentation. More info about data transfers can be found in our Data Transfer documentation.","title":"Data Transfers"},{"location":"news/2022-02/#software-highlights","text":"Rclone is now installed on all nodes and loading the module is no longer necessary. MATLAB/2021b is now on all clusters. Julia/1.7.1-linux-x86_64 is now on all clusters. Mathematica/13.0.0 is now on Grace. QuantumESPRESSO/6.8-intel-2020b and QuantumESPRESSO/7.0-intel-2020b are now on Grace. Mathematica documentation has been updated with regards to configuring parallel jobs.","title":"Software Highlights"},{"location":"news/2022-03/","text":"March 2022 Announcements Snapshots Snapshots are now available on all clusters for home and project spaces. Snapshots enable self-service restoration of modified or deleted files for at least 2 days in the past. See our User Documentation for more details on availability and instructions. OOD File Browser Tip: Shortcuts You can add shortcuts to your favorite paths in the OOD File Browser. See our OOD documentation for instructions on setting up shortcuts. Software Highlights R/4.1.0-foss-2020b is now on Grace. GCC/11.2.0 is now on Grace.","title":"2022 03"},{"location":"news/2022-03/#march-2022","text":"","title":"March 2022"},{"location":"news/2022-03/#announcements","text":"","title":"Announcements"},{"location":"news/2022-03/#snapshots","text":"Snapshots are now available on all clusters for home and project spaces. Snapshots enable self-service restoration of modified or deleted files for at least 2 days in the past. See our User Documentation for more details on availability and instructions.","title":"Snapshots"},{"location":"news/2022-03/#ood-file-browser-tip-shortcuts","text":"You can add shortcuts to your favorite paths in the OOD File Browser. See our OOD documentation for instructions on setting up shortcuts.","title":"OOD File Browser Tip: Shortcuts"},{"location":"news/2022-03/#software-highlights","text":"R/4.1.0-foss-2020b is now on Grace. GCC/11.2.0 is now on Grace.","title":"Software Highlights"},{"location":"news/2022-04-farnam/","text":"Farnam Maintenance April 4-7, 2022 Software Updates Security updates Slurm updated to 21.08.6 NVIDIA drivers updated to 510.47.03 (note: driver for NVIDIA K80 GPUs was upgraded to 470.103.01) Singularity replaced by Apptainer version 1.0.1 (note: the \"singularity\" command will still work as expected) Open OnDemand updated to 2.0.20 Hardware Updates Four new nodes with 4 NVIDIA GTX3090 GPUs each have been added Changes to the bigmem Partition Jobs requesting less than 120G of memory are no longer allowed in the \"bigmem\" partition. Please submit these jobs to the general or scavenge partitions instead. Changes to non-interactive sessions Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.","title":"2022 04 farnam"},{"location":"news/2022-04-farnam/#farnam-maintenance","text":"April 4-7, 2022","title":"Farnam Maintenance"},{"location":"news/2022-04-farnam/#software-updates","text":"Security updates Slurm updated to 21.08.6 NVIDIA drivers updated to 510.47.03 (note: driver for NVIDIA K80 GPUs was upgraded to 470.103.01) Singularity replaced by Apptainer version 1.0.1 (note: the \"singularity\" command will still work as expected) Open OnDemand updated to 2.0.20","title":"Software Updates"},{"location":"news/2022-04-farnam/#hardware-updates","text":"Four new nodes with 4 NVIDIA GTX3090 GPUs each have been added","title":"Hardware Updates"},{"location":"news/2022-04-farnam/#changes-to-the-bigmem-partition","text":"Jobs requesting less than 120G of memory are no longer allowed in the \"bigmem\" partition. Please submit these jobs to the general or scavenge partitions instead.","title":"Changes to the bigmem Partition"},{"location":"news/2022-04-farnam/#changes-to-non-interactive-sessions","text":"Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.","title":"Changes to non-interactive sessions"},{"location":"news/2022-04/","text":"April 2022 Announcements Updates to R on Open OnDemand RStudio Server is out of beta! With the deprecation of R 3.x (see below), we will be removing RStudio Desktop with module R from Open OnDemand on June 1st. Improvements to R install.packages Paths Starting with the R 4.1.0 software module, we now automatically set an environment variable ( R_LIBS_USER ) which directs these packages to be stored in your project space. This will helps ensure that packages are not limited by home-space quotas and that packages installed for different versions of R are properly separated from each other. Previously installed packages should still be available and there should be no disruption from the change. Instructions for Running a MySQL Server on the Clusters Occasionally it could be useful for a user to run their own MySQL database server on one of the clusters. Until now, that has not been possible, but recently we found a way using singularity. Instructions may be found in our new MySQL guide . Software Highlights R 3.x modules have been deprecated on all clusters and are no longer supported. If you need to continue to use an older version of R, look at our R conda documentation . R/4.1.0-foss-2020b is now available on all clusters. Seurat/4.1.0-foss-2020b-R-4.1.0 (for using the Seurat R package) is now available on all clusters.","title":"2022 04"},{"location":"news/2022-04/#april-2022","text":"","title":"April 2022"},{"location":"news/2022-04/#announcements","text":"","title":"Announcements"},{"location":"news/2022-04/#updates-to-r-on-open-ondemand","text":"RStudio Server is out of beta! With the deprecation of R 3.x (see below), we will be removing RStudio Desktop with module R from Open OnDemand on June 1st.","title":"Updates to R on Open OnDemand"},{"location":"news/2022-04/#improvements-to-r-installpackages-paths","text":"Starting with the R 4.1.0 software module, we now automatically set an environment variable ( R_LIBS_USER ) which directs these packages to be stored in your project space. This will helps ensure that packages are not limited by home-space quotas and that packages installed for different versions of R are properly separated from each other. Previously installed packages should still be available and there should be no disruption from the change.","title":"Improvements to R install.packages Paths"},{"location":"news/2022-04/#instructions-for-running-a-mysql-server-on-the-clusters","text":"Occasionally it could be useful for a user to run their own MySQL database server on one of the clusters. Until now, that has not been possible, but recently we found a way using singularity. Instructions may be found in our new MySQL guide .","title":"Instructions for Running a MySQL Server on the Clusters"},{"location":"news/2022-04/#software-highlights","text":"R 3.x modules have been deprecated on all clusters and are no longer supported. If you need to continue to use an older version of R, look at our R conda documentation . R/4.1.0-foss-2020b is now available on all clusters. Seurat/4.1.0-foss-2020b-R-4.1.0 (for using the Seurat R package) is now available on all clusters.","title":"Software Highlights"},{"location":"news/2022-05-ruddle/","text":"Ruddle Maintenance May 2, 2022 Software Updates Security updates Slurm updated to 21.08.7 Singularity replaced by Apptainer version 1.0.1 (note: the \"singularity\" command will still work as expected) Lmod updated to 8.7 Changes to non-interactive sessions Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.","title":"2022 05 ruddle"},{"location":"news/2022-05-ruddle/#ruddle-maintenance","text":"May 2, 2022","title":"Ruddle Maintenance"},{"location":"news/2022-05-ruddle/#software-updates","text":"Security updates Slurm updated to 21.08.7 Singularity replaced by Apptainer version 1.0.1 (note: the \"singularity\" command will still work as expected) Lmod updated to 8.7","title":"Software Updates"},{"location":"news/2022-05-ruddle/#changes-to-non-interactive-sessions","text":"Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.","title":"Changes to non-interactive sessions"},{"location":"news/2022-05/","text":"May 2022 Announcements Ruddle Maintenance The biannual scheduled maintenance for the Ruddle cluster will be occurring May 3-5. During this time, the cluster will be unavailable. See the Ruddle maintenance email announcements for more details. Remote Visualization with Hardware Acceleration VirtualGL is installed on all GPU nodes on Grace, Farnam, and Milgram to provide hardware accelerated 3D rendering. Instructions on how to use VirtualGL to accelerate your 3D applications can be found at https://docs.ycrc.yale.edu/clusters-at-yale/guides/virtualgl/ . Software Highlights Singularity is now called \"Apptainer\". Singularity is officially named \u201cApptainer\u201d as part of its move to the Linux Foundation. The new command apptainer works as drop-in replacement for singularity . However, the previous singularity command will also continue to work for the foreseeable future so no change is needed. The upgrade to Apptainer is on Grace, Farnam and Ruddle (as of the maintenance completion). Milgram will be upgraded to Apptainer during the June maintenance. Slurm has been upgraded to version 21.08.6 on Grace MATLAB/2022a is available on all clusters","title":"2022 05"},{"location":"news/2022-05/#may-2022","text":"","title":"May 2022"},{"location":"news/2022-05/#announcements","text":"","title":"Announcements"},{"location":"news/2022-05/#ruddle-maintenance","text":"The biannual scheduled maintenance for the Ruddle cluster will be occurring May 3-5. During this time, the cluster will be unavailable. See the Ruddle maintenance email announcements for more details.","title":"Ruddle Maintenance"},{"location":"news/2022-05/#remote-visualization-with-hardware-acceleration","text":"VirtualGL is installed on all GPU nodes on Grace, Farnam, and Milgram to provide hardware accelerated 3D rendering. Instructions on how to use VirtualGL to accelerate your 3D applications can be found at https://docs.ycrc.yale.edu/clusters-at-yale/guides/virtualgl/ .","title":"Remote Visualization with Hardware Acceleration"},{"location":"news/2022-05/#software-highlights","text":"Singularity is now called \"Apptainer\". Singularity is officially named \u201cApptainer\u201d as part of its move to the Linux Foundation. The new command apptainer works as drop-in replacement for singularity . However, the previous singularity command will also continue to work for the foreseeable future so no change is needed. The upgrade to Apptainer is on Grace, Farnam and Ruddle (as of the maintenance completion). Milgram will be upgraded to Apptainer during the June maintenance. Slurm has been upgraded to version 21.08.6 on Grace MATLAB/2022a is available on all clusters","title":"Software Highlights"},{"location":"news/2022-06-milgram/","text":"Milgram Maintenance June 7-8, 2022 Software Updates Security updates Slurm updated to 21.08.8-2 NVIDIA drivers updated to 515.43.04 Singularity replaced by Apptainer version 1.0.2 (note: the \"singularity\" command will still work as expected) Lmod updated to 8.7 Open OnDemand updated to 2.0.23 Hardware Updates The hostnames of the compute nodes on Milgram were changed to bring them in line with YCRC naming conventions. Changes to non-interactive sessions Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.","title":"2022 06 milgram"},{"location":"news/2022-06-milgram/#milgram-maintenance","text":"June 7-8, 2022","title":"Milgram Maintenance"},{"location":"news/2022-06-milgram/#software-updates","text":"Security updates Slurm updated to 21.08.8-2 NVIDIA drivers updated to 515.43.04 Singularity replaced by Apptainer version 1.0.2 (note: the \"singularity\" command will still work as expected) Lmod updated to 8.7 Open OnDemand updated to 2.0.23","title":"Software Updates"},{"location":"news/2022-06-milgram/#hardware-updates","text":"The hostnames of the compute nodes on Milgram were changed to bring them in line with YCRC naming conventions.","title":"Hardware Updates"},{"location":"news/2022-06-milgram/#changes-to-non-interactive-sessions","text":"Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.","title":"Changes to non-interactive sessions"},{"location":"news/2022-06/","text":"June 2022 Announcements Farnam Decommission & McCleary Announcement After more than six years in service, we will be retiring the Farnam HPC cluster later this year. Farnam will be replaced with a new HPC cluster, McCleary. The McCleary HPC cluster will be Yale's first direct-to-chip liquid cooled cluster, moving the YCRC and the Yale research computing community into a more environmentally friendly future. For more information about the decommission process and the launch of McCleary, see our website . RStudio (with module R) has been retired from Open OnDemand as of June 1st Please switch to RStudio Server which provides a better user experience. For users using a conda environment with RStudio, RStudio (with Conda R) will continue to be served on Open OnDemand. Milgram Maintenance The biannual scheduled maintenance for the Milgram cluster will be occurring June 7-9. During this time, the cluster will be unavailable. See the Milgram maintenance email announcements for more details. Software Highlights QTLtools/1.3.1-foss-2020b is now available on Farnam. R/4.2.0-foss-2020b is available on all clusters. Seurat for R/4.2.0 is now available on all clusters through the R-bundle-Bioconductor/3.15-foss-2020b-R-4.2.0 module along with many other packages. Please check to see if any packages you need are available in these modules before running install.packages .","title":"2022 06"},{"location":"news/2022-06/#june-2022","text":"","title":"June 2022"},{"location":"news/2022-06/#announcements","text":"","title":"Announcements"},{"location":"news/2022-06/#farnam-decommission-mccleary-announcement","text":"After more than six years in service, we will be retiring the Farnam HPC cluster later this year. Farnam will be replaced with a new HPC cluster, McCleary. The McCleary HPC cluster will be Yale's first direct-to-chip liquid cooled cluster, moving the YCRC and the Yale research computing community into a more environmentally friendly future. For more information about the decommission process and the launch of McCleary, see our website .","title":"Farnam Decommission & McCleary Announcement"},{"location":"news/2022-06/#rstudio-with-module-r-has-been-retired-from-open-ondemand-as-of-june-1st","text":"Please switch to RStudio Server which provides a better user experience. For users using a conda environment with RStudio, RStudio (with Conda R) will continue to be served on Open OnDemand.","title":"RStudio (with module R) has been retired from Open OnDemand as of June 1st"},{"location":"news/2022-06/#milgram-maintenance","text":"The biannual scheduled maintenance for the Milgram cluster will be occurring June 7-9. During this time, the cluster will be unavailable. See the Milgram maintenance email announcements for more details.","title":"Milgram Maintenance"},{"location":"news/2022-06/#software-highlights","text":"QTLtools/1.3.1-foss-2020b is now available on Farnam. R/4.2.0-foss-2020b is available on all clusters. Seurat for R/4.2.0 is now available on all clusters through the R-bundle-Bioconductor/3.15-foss-2020b-R-4.2.0 module along with many other packages. Please check to see if any packages you need are available in these modules before running install.packages .","title":"Software Highlights"},{"location":"news/2022-07/","text":"July 2022 Announcements Loomis Decommission After almost a decade in service, the primary storage system on Grace, Loomis ( /gpfs/loomis ), will be retired later this year. The usage and capacity on Loomis will be replaced by two existing YCRC storage systems, Palmer and Gibbs, which are already available on Grace. Data in Loomis project storage will be migrated to /gpfs/gibbs/project during the upcoming August Grace maintenance. See the Loomis Decommission documenation for more information and updates. Updates to OOD Jupyter App OOD Jupyter App has been updated to handle conda environments more intelligently. Instead of listing all the conda envs in your account, the app now lists only the conda environments with Jupyter installed. If you do not see your desired environment listed in the dropdown, check that you have installed Jupyter in that environment. In addition, the \u201cjupyterlab\u201d checkbox in the app will only be visible if the environment selected has jupyterlab installed. YCRC conda environment ycrc_conda_env.list has been replaced by ycrc_conda_env.sh . To update your conda enviroments in OOD for the Jupyter App and RStudio Desktop (with Conda R), please run ycrc_conda_env.sh update . Software Highlights miniconda/4.12.0 is now available on all clusters RStudio/2022.02.3-492 is now available on all clusters. This is currently the only version that is compatible with the graphic engine used by R/4.2.0-foss-2020b. fmriprep/21.0.2 is now available on Milgram. cellranger/7.0.0 is now available on Farnam.","title":"2022 07"},{"location":"news/2022-07/#july-2022","text":"","title":"July 2022"},{"location":"news/2022-07/#announcements","text":"","title":"Announcements"},{"location":"news/2022-07/#loomis-decommission","text":"After almost a decade in service, the primary storage system on Grace, Loomis ( /gpfs/loomis ), will be retired later this year. The usage and capacity on Loomis will be replaced by two existing YCRC storage systems, Palmer and Gibbs, which are already available on Grace. Data in Loomis project storage will be migrated to /gpfs/gibbs/project during the upcoming August Grace maintenance. See the Loomis Decommission documenation for more information and updates.","title":"Loomis Decommission"},{"location":"news/2022-07/#updates-to-ood-jupyter-app","text":"OOD Jupyter App has been updated to handle conda environments more intelligently. Instead of listing all the conda envs in your account, the app now lists only the conda environments with Jupyter installed. If you do not see your desired environment listed in the dropdown, check that you have installed Jupyter in that environment. In addition, the \u201cjupyterlab\u201d checkbox in the app will only be visible if the environment selected has jupyterlab installed.","title":"Updates to OOD Jupyter App"},{"location":"news/2022-07/#ycrc-conda-environment","text":"ycrc_conda_env.list has been replaced by ycrc_conda_env.sh . To update your conda enviroments in OOD for the Jupyter App and RStudio Desktop (with Conda R), please run ycrc_conda_env.sh update .","title":"YCRC conda environment"},{"location":"news/2022-07/#software-highlights","text":"miniconda/4.12.0 is now available on all clusters RStudio/2022.02.3-492 is now available on all clusters. This is currently the only version that is compatible with the graphic engine used by R/4.2.0-foss-2020b. fmriprep/21.0.2 is now available on Milgram. cellranger/7.0.0 is now available on Farnam.","title":"Software Highlights"},{"location":"news/2022-08-grace/","text":"Grace Maintenance August 2-4, 2022 Software Updates Security updates Slurm updated to 22.05.2 NVIDIA drivers updated to 515.48.07 (except for nodes with K80 GPUs, which are stranded at 470.129.06) Singularity replaced by Apptainer version 1.0.3 (note: the \"singularity\" command will still work as expected) Lmod updated to 8.7 Open OnDemand updated to 2.0.26 Hardware Updates Core components of the ethernet network were upgraded to improve performance and increase overall capacity. Loomis Decommission and Project Data Migration After over eight years in service, the primary storage system on Grace, Loomis ( /gpfs/loomis ), will be retired later this year. Project. We have migrated all of the Loomis project space ( /gpfs/loomis/project ) to the Gibbs storage system at /gpfs/gibbs/project during the maintenance. You will need to update your scripts and workflows to point to the new location ( /gpfs/gibbs/project// ). The \"project\" symlink in your home directory has been updated to point to your new space (with a few exceptions described below), so scripts using the symlinked path will not need to be updated. If you have jobs in a pending state going into the maintenance that used the absolute Loomis path, we recommend canceling, updating and then re-submitting those jobs so they do not fail. If you had a project space that exceeds the no-cost allocation (4 TiB), you have received a separate communication from us with details about your data migration. In these instances, your group has been granted a new, empty \"project\" space with the default no-cost quota. Any scripts will need to be updated accordingly. Conda. By default, all conda environments are installed into your project directory. However, most conda environments do not survive being moved from one location to another, so you may need to regenerate your conda environment(s). To assist with this, we provide conda-export documentation . R. Similarly, in 2022 we started redirecting user R packages to your project space to conserve home directory usage. If your R environment is not working as expected, we recommend deleting the old installation (found in ~/project/R/ ) and rerunning install.packages. Custom Software Installation. If you or your group had any self-installed software in the project directory, it is possible that the migration will have broken the software and it will need to be recompiled. Contact us if you need assistance recompiling. Scratch60. The Loomis scratch space ( /gpfs/loomis/scratch60 ) is now read-only. All data in that directory will be purged in 60 days on October 3, 2022 . Any data in /gpfs/loomis/scratch60 you wish to retain needs to be copied into another location by that date (such as your Gibbs project or Palmer scratch). Changes to Non-Interactive Sessions Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.","title":"2022 08 grace"},{"location":"news/2022-08-grace/#grace-maintenance","text":"August 2-4, 2022","title":"Grace Maintenance"},{"location":"news/2022-08-grace/#software-updates","text":"Security updates Slurm updated to 22.05.2 NVIDIA drivers updated to 515.48.07 (except for nodes with K80 GPUs, which are stranded at 470.129.06) Singularity replaced by Apptainer version 1.0.3 (note: the \"singularity\" command will still work as expected) Lmod updated to 8.7 Open OnDemand updated to 2.0.26","title":"Software Updates"},{"location":"news/2022-08-grace/#hardware-updates","text":"Core components of the ethernet network were upgraded to improve performance and increase overall capacity.","title":"Hardware Updates"},{"location":"news/2022-08-grace/#loomis-decommission-and-project-data-migration","text":"After over eight years in service, the primary storage system on Grace, Loomis ( /gpfs/loomis ), will be retired later this year. Project. We have migrated all of the Loomis project space ( /gpfs/loomis/project ) to the Gibbs storage system at /gpfs/gibbs/project during the maintenance. You will need to update your scripts and workflows to point to the new location ( /gpfs/gibbs/project// ). The \"project\" symlink in your home directory has been updated to point to your new space (with a few exceptions described below), so scripts using the symlinked path will not need to be updated. If you have jobs in a pending state going into the maintenance that used the absolute Loomis path, we recommend canceling, updating and then re-submitting those jobs so they do not fail. If you had a project space that exceeds the no-cost allocation (4 TiB), you have received a separate communication from us with details about your data migration. In these instances, your group has been granted a new, empty \"project\" space with the default no-cost quota. Any scripts will need to be updated accordingly. Conda. By default, all conda environments are installed into your project directory. However, most conda environments do not survive being moved from one location to another, so you may need to regenerate your conda environment(s). To assist with this, we provide conda-export documentation . R. Similarly, in 2022 we started redirecting user R packages to your project space to conserve home directory usage. If your R environment is not working as expected, we recommend deleting the old installation (found in ~/project/R/ ) and rerunning install.packages. Custom Software Installation. If you or your group had any self-installed software in the project directory, it is possible that the migration will have broken the software and it will need to be recompiled. Contact us if you need assistance recompiling. Scratch60. The Loomis scratch space ( /gpfs/loomis/scratch60 ) is now read-only. All data in that directory will be purged in 60 days on October 3, 2022 . Any data in /gpfs/loomis/scratch60 you wish to retain needs to be copied into another location by that date (such as your Gibbs project or Palmer scratch).","title":"Loomis Decommission and Project Data Migration"},{"location":"news/2022-08-grace/#changes-to-non-interactive-sessions","text":"Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.","title":"Changes to Non-Interactive Sessions"},{"location":"news/2022-08/","text":"August 2022 Announcements Grace Maintenance & Storage Changes The biannual scheduled maintenance for the Grace cluster will be occurring August 2-4. During this time, the cluster will be unavailable. See the Grace maintenance email announcement for more details. During the maintenance, significant changes will be made to the project and scratch60 directories on Grace. See our website for more information and updates . SpinUp Researcher Image & Containers Yale offers a simple portal for creating cloud-based compute resources called SpinUp . These cloud instances are hosted on Amazon Web Services, but have access to Yale services like Active Directory, DNS, and Storage at Yale. SpinUp offers a range of services including virtual machines, web servers, remote storage, and databases. Part of this service is a Researcher Image, an Ubuntu-based system which contains a suite of pre-installed commonly utilized software utilities, including: - PyTorch, TensorFlow, Keras, and other GPU-accelerated deep learning frameworks - GCC, Cmake, Go, and other development tools - Singularity/Apptainer and Docker for container development We recommend researchers looking to develop containers for use on YCRC HPC resources to utilize SpinUp to build containers which can then be copied to the clusters. If there are software utilities or commonly used tools that you would like added to the Researcher Image, let us know and we can work with the Cloud Team to get them integrated. Software Highlights AFNI/2022.1.14 is now available on Farnam and Milgram. cellranger/7.0.0 is now available on Grace.","title":"2022 08"},{"location":"news/2022-08/#august-2022","text":"","title":"August 2022"},{"location":"news/2022-08/#announcements","text":"","title":"Announcements"},{"location":"news/2022-08/#grace-maintenance-storage-changes","text":"The biannual scheduled maintenance for the Grace cluster will be occurring August 2-4. During this time, the cluster will be unavailable. See the Grace maintenance email announcement for more details. During the maintenance, significant changes will be made to the project and scratch60 directories on Grace. See our website for more information and updates .","title":"Grace Maintenance & Storage Changes"},{"location":"news/2022-08/#spinup-researcher-image-containers","text":"Yale offers a simple portal for creating cloud-based compute resources called SpinUp . These cloud instances are hosted on Amazon Web Services, but have access to Yale services like Active Directory, DNS, and Storage at Yale. SpinUp offers a range of services including virtual machines, web servers, remote storage, and databases. Part of this service is a Researcher Image, an Ubuntu-based system which contains a suite of pre-installed commonly utilized software utilities, including: - PyTorch, TensorFlow, Keras, and other GPU-accelerated deep learning frameworks - GCC, Cmake, Go, and other development tools - Singularity/Apptainer and Docker for container development We recommend researchers looking to develop containers for use on YCRC HPC resources to utilize SpinUp to build containers which can then be copied to the clusters. If there are software utilities or commonly used tools that you would like added to the Researcher Image, let us know and we can work with the Cloud Team to get them integrated.","title":"SpinUp Researcher Image & Containers"},{"location":"news/2022-08/#software-highlights","text":"AFNI/2022.1.14 is now available on Farnam and Milgram. cellranger/7.0.0 is now available on Grace.","title":"Software Highlights"},{"location":"news/2022-09/","text":"September 2022 Announcements Software Module Extensions Our software module utility ( Lmod ) has been enhanced to enable searching for Python and R (among other software) extensions. This is a very helpful way to know which software modules contain a specific library or package. For example, to see what versions of ggplot2 are available, use the module spider command. $ module spider ggplot2 -------------------------------------------------------- ggplot2: -------------------------------------------------------- Versions: ggplot2/3.3.2 (E) ggplot2/3.3.3 (E) ggplot2/3.3.5 (E) $ module spider ggplot2/3.3.5 ----------------------------------------------------------- ggplot2: ggplot2/3.3.5 (E) ----------------------------------------------------------- This extension is provided by the following modules. To access the extension you must load one of the following modules. Note that any module names in parentheses show the module location in the software hierarchy. R/4.2.0-foss-2020b This indicates that by loading the R/4.2.0-foss-2020b module you will gain access to ggplot2/3.3.5 . Software Highlights topaz/0.2.5-fosscuda-2020b for use with RELION (fosscuda-2020b toolchain) is now available as a module on Farnam.","title":"2022 09"},{"location":"news/2022-09/#september-2022","text":"","title":"September 2022"},{"location":"news/2022-09/#announcements","text":"","title":"Announcements"},{"location":"news/2022-09/#software-module-extensions","text":"Our software module utility ( Lmod ) has been enhanced to enable searching for Python and R (among other software) extensions. This is a very helpful way to know which software modules contain a specific library or package. For example, to see what versions of ggplot2 are available, use the module spider command. $ module spider ggplot2 -------------------------------------------------------- ggplot2: -------------------------------------------------------- Versions: ggplot2/3.3.2 (E) ggplot2/3.3.3 (E) ggplot2/3.3.5 (E) $ module spider ggplot2/3.3.5 ----------------------------------------------------------- ggplot2: ggplot2/3.3.5 (E) ----------------------------------------------------------- This extension is provided by the following modules. To access the extension you must load one of the following modules. Note that any module names in parentheses show the module location in the software hierarchy. R/4.2.0-foss-2020b This indicates that by loading the R/4.2.0-foss-2020b module you will gain access to ggplot2/3.3.5 .","title":"Software Module Extensions"},{"location":"news/2022-09/#software-highlights","text":"topaz/0.2.5-fosscuda-2020b for use with RELION (fosscuda-2020b toolchain) is now available as a module on Farnam.","title":"Software Highlights"},{"location":"news/2022-10-farnam/","text":"Farnam Maintenance October 4-5, 2022 Software Updates Security updates Slurm updated to 22.05.3 NVIDIA drivers updated to 515.65.01 Lmod updated to 8.7 Apptainer updated to 1.0.3 Open OnDemand updated to 2.0.28 Hardware Updates No hardware changes during this maintenance.","title":"2022 10 farnam"},{"location":"news/2022-10-farnam/#farnam-maintenance","text":"October 4-5, 2022","title":"Farnam Maintenance"},{"location":"news/2022-10-farnam/#software-updates","text":"Security updates Slurm updated to 22.05.3 NVIDIA drivers updated to 515.65.01 Lmod updated to 8.7 Apptainer updated to 1.0.3 Open OnDemand updated to 2.0.28","title":"Software Updates"},{"location":"news/2022-10-farnam/#hardware-updates","text":"No hardware changes during this maintenance.","title":"Hardware Updates"},{"location":"news/2022-10/","text":"October 2022 Announcements Farnam Maintenance The biannual scheduled maintenance for the Farnam cluster will be occurring Oct 4-6. During this time, the cluster will be unavailable. See the Farnam maintenance email announcements for more details. Gibbs Maintenance Additionally, the Gibbs storage system will be unavailable on Grace and Ruddle on Oct 4 to deploy an urgent firmware fix. All jobs on those clusters will be held, and no new jobs will be able to start during the upgrade to avoid job failures. New Command for Interactive Jobs The new version of Slurm (the scheduler) has improved the process of launching an interactive compute job. Instead of the clunky srun --pty bash syntax from previous versions, this is now replaced with salloc . In addition, the interactive partition is now the default partition for jobs launched using salloc . Thus a simple (1 core, 1 hour) interactive job can be requested like this: salloc which will submit the job and then move your shell to the allocated compute node. For MPI users, this allows multi-node parallel jobs to be properly launched inside an interactive compute job, which did not work as expected previously. For example, here is a two-node job, launched with salloc and then a parallel job-step launched with srun : [user@grace1 ~]$ salloc --nodes 2 --ntasks 2 --cpus-per-task 1 salloc: Nodes p09r07n[24,28] are ready for job [user@p09r07n24 ~]$ srun hostname p09r07n24.grace.hpc.yale.internal P09r07n28.grace.hpc.yale.internal For more information on salloc , please refer to Slurm\u2019s documentation . Software Highlights cellranger/7.0.1 is now available on Farnam. LAMMPS/23Jun2022-foss-2020b-kokkos is now available on Grace.","title":"2022 10"},{"location":"news/2022-10/#october-2022","text":"","title":"October 2022"},{"location":"news/2022-10/#announcements","text":"","title":"Announcements"},{"location":"news/2022-10/#farnam-maintenance","text":"The biannual scheduled maintenance for the Farnam cluster will be occurring Oct 4-6. During this time, the cluster will be unavailable. See the Farnam maintenance email announcements for more details.","title":"Farnam Maintenance"},{"location":"news/2022-10/#gibbs-maintenance","text":"Additionally, the Gibbs storage system will be unavailable on Grace and Ruddle on Oct 4 to deploy an urgent firmware fix. All jobs on those clusters will be held, and no new jobs will be able to start during the upgrade to avoid job failures.","title":"Gibbs Maintenance"},{"location":"news/2022-10/#new-command-for-interactive-jobs","text":"The new version of Slurm (the scheduler) has improved the process of launching an interactive compute job. Instead of the clunky srun --pty bash syntax from previous versions, this is now replaced with salloc . In addition, the interactive partition is now the default partition for jobs launched using salloc . Thus a simple (1 core, 1 hour) interactive job can be requested like this: salloc which will submit the job and then move your shell to the allocated compute node. For MPI users, this allows multi-node parallel jobs to be properly launched inside an interactive compute job, which did not work as expected previously. For example, here is a two-node job, launched with salloc and then a parallel job-step launched with srun : [user@grace1 ~]$ salloc --nodes 2 --ntasks 2 --cpus-per-task 1 salloc: Nodes p09r07n[24,28] are ready for job [user@p09r07n24 ~]$ srun hostname p09r07n24.grace.hpc.yale.internal P09r07n28.grace.hpc.yale.internal For more information on salloc , please refer to Slurm\u2019s documentation .","title":"New Command for Interactive Jobs"},{"location":"news/2022-10/#software-highlights","text":"cellranger/7.0.1 is now available on Farnam. LAMMPS/23Jun2022-foss-2020b-kokkos is now available on Grace.","title":"Software Highlights"},{"location":"news/2022-11-ruddle/","text":"Ruddle Maintenance November 1, 2022 Software Updates Security updates Slurm updated to 22.05.5 Apptainer updated to 1.1.2 Open OnDemand updated to 2.0.28 Hardware Updates No hardware changes during this maintenance.","title":"2022 11 ruddle"},{"location":"news/2022-11-ruddle/#ruddle-maintenance","text":"November 1, 2022","title":"Ruddle Maintenance"},{"location":"news/2022-11-ruddle/#software-updates","text":"Security updates Slurm updated to 22.05.5 Apptainer updated to 1.1.2 Open OnDemand updated to 2.0.28","title":"Software Updates"},{"location":"news/2022-11-ruddle/#hardware-updates","text":"No hardware changes during this maintenance.","title":"Hardware Updates"},{"location":"news/2022-11/","text":"November 2022 Announcements Ruddle Maintenance The biannual scheduled maintenance for the Ruddle cluster will be occurring Nov 1-3. During this time, the cluster will be unavailable. See the Ruddle maintenance email announcements for more details. Grace and Milgram Maintenance Schedule Change We will be adjusting the timing of Grace and Milgram's scheduled maintenance periods. Starting this December, Grace's maintenance periods will occur in December and June, with the next scheduled for December 6-8, 2022. Milgram's next maintenance will instead be performed in February and August, with the next scheduled for February 7-9, 2023. Please refer to previously sent communications for more information and see the full maintenance schedule for next year on our status page. Requeue after Timeout The YCRC clusters all have maximum time-limits that sometimes are shorter than a job needs to finish. This can be a frustration for researchers trying to get a simulation or a project finished. However, a number of workflows have the ability to periodically save the status of a process to a file and restart from where it left off. This is often referred to as \"checkpointing\" and is built into many standard software tools, like Gaussian and Gromacs. Slurm is able to send a signal to your job just before it runs out of time. Upon receiving this signal, you can have your job save its current status and automatically submit a new version of the job which picks up where it left off. Here is an example of a simple script that resubmits a job after receiving the TIMEOUT signal: #!/bin/bash #SBATCH -p day #SBATCH -t 24:00:00 #SBATCH -c 1 #SBATCH --signal=B:10@30 # send the signal `10` at 30s before job finishes #SBATCH --requeue # mark this job eligible for requeueing # define a `trap` that catches the signal and requeues the job trap \"echo -n 'TIMEOUT @ '; date; echo 'Resubmitting...'; scontrol requeue ${SLURM_JOBID} \" 10 # run the main code, with the `&` to \u201cbackground\u201d the task ./my_code.exe & # wait for either the main code to finish to receive the signal wait This tells Slurm to send SIGNAL10 at ~30s before the job finishes. Then we define an action (or trap ) based on this signal which requeues the job. Don\u2019t forget to add the & to the end of the main executable and the wait command so that the trap is able to catch the signal. Software Highlights MATLAB/2022b is now available on all clusters.","title":"2022 11"},{"location":"news/2022-11/#november-2022","text":"","title":"November 2022"},{"location":"news/2022-11/#announcements","text":"","title":"Announcements"},{"location":"news/2022-11/#ruddle-maintenance","text":"The biannual scheduled maintenance for the Ruddle cluster will be occurring Nov 1-3. During this time, the cluster will be unavailable. See the Ruddle maintenance email announcements for more details.","title":"Ruddle Maintenance"},{"location":"news/2022-11/#grace-and-milgram-maintenance-schedule-change","text":"We will be adjusting the timing of Grace and Milgram's scheduled maintenance periods. Starting this December, Grace's maintenance periods will occur in December and June, with the next scheduled for December 6-8, 2022. Milgram's next maintenance will instead be performed in February and August, with the next scheduled for February 7-9, 2023. Please refer to previously sent communications for more information and see the full maintenance schedule for next year on our status page.","title":"Grace and Milgram Maintenance Schedule Change"},{"location":"news/2022-11/#requeue-after-timeout","text":"The YCRC clusters all have maximum time-limits that sometimes are shorter than a job needs to finish. This can be a frustration for researchers trying to get a simulation or a project finished. However, a number of workflows have the ability to periodically save the status of a process to a file and restart from where it left off. This is often referred to as \"checkpointing\" and is built into many standard software tools, like Gaussian and Gromacs. Slurm is able to send a signal to your job just before it runs out of time. Upon receiving this signal, you can have your job save its current status and automatically submit a new version of the job which picks up where it left off. Here is an example of a simple script that resubmits a job after receiving the TIMEOUT signal: #!/bin/bash #SBATCH -p day #SBATCH -t 24:00:00 #SBATCH -c 1 #SBATCH --signal=B:10@30 # send the signal `10` at 30s before job finishes #SBATCH --requeue # mark this job eligible for requeueing # define a `trap` that catches the signal and requeues the job trap \"echo -n 'TIMEOUT @ '; date; echo 'Resubmitting...'; scontrol requeue ${SLURM_JOBID} \" 10 # run the main code, with the `&` to \u201cbackground\u201d the task ./my_code.exe & # wait for either the main code to finish to receive the signal wait This tells Slurm to send SIGNAL10 at ~30s before the job finishes. Then we define an action (or trap ) based on this signal which requeues the job. Don\u2019t forget to add the & to the end of the main executable and the wait command so that the trap is able to catch the signal.","title":"Requeue after Timeout"},{"location":"news/2022-11/#software-highlights","text":"MATLAB/2022b is now available on all clusters.","title":"Software Highlights"},{"location":"news/2022-12-grace/","text":"Grace Maintenance December 6-8, 2022 Software Updates Slurm updated to 22.05.6 NVIDIA drivers updated to 520.61.05 Apptainer updated to 1.1.3 Open OnDemand updated to 2.0.28 Hardware Updates Roughly 2 racks worth of equipment were moved to upgrade the effective InfiniBand connection speeds of several compute nodes (from 56 to 100 Gbps) The InfiniBand network was modified to increase capacity and allow for additional growth Some parts of the regular network were improved to shorten network paths and increase shared-uplink bandwidth Loomis Decommission The Loomis GPFS filesystem has been retired and unmounted from Grace, Farnam, and Ruddle. For additional information please see the Loomis Decommission page .","title":"2022 12 grace"},{"location":"news/2022-12-grace/#grace-maintenance","text":"December 6-8, 2022","title":"Grace Maintenance"},{"location":"news/2022-12-grace/#software-updates","text":"Slurm updated to 22.05.6 NVIDIA drivers updated to 520.61.05 Apptainer updated to 1.1.3 Open OnDemand updated to 2.0.28","title":"Software Updates"},{"location":"news/2022-12-grace/#hardware-updates","text":"Roughly 2 racks worth of equipment were moved to upgrade the effective InfiniBand connection speeds of several compute nodes (from 56 to 100 Gbps) The InfiniBand network was modified to increase capacity and allow for additional growth Some parts of the regular network were improved to shorten network paths and increase shared-uplink bandwidth","title":"Hardware Updates"},{"location":"news/2022-12-grace/#loomis-decommission","text":"The Loomis GPFS filesystem has been retired and unmounted from Grace, Farnam, and Ruddle. For additional information please see the Loomis Decommission page .","title":"Loomis Decommission"},{"location":"news/2022-12/","text":"December 2022 Announcements Grace & Gibbs Maintenance The biannual scheduled maintenance for the Grace cluster will be occurring December 6-8. During this time, the cluster will be unavailable. Additionally, the Gibbs filesystem will be unavailable on Farnam and Ruddle on Tuesday, December 6th to deploy a critical firmware upgrade. See the maintenance email announcements for more details. Loomis Decommission The Loomis GPFS filesystem will be retired and unmounted from Grace and Farnam during the Grace December maintenance starting on December 6th. All data except for a few remaining private filesets have already been transferred to other systems (e.g., current software, home, scratch to Palmer and project to Gibbs). The remaining private filesets are being transferred to Gibbs in advance of the maintenance and owners should have received communications accordingly. The only potential user impact of the retirement is on anyone using the older, deprecated software trees. Otherwise, the Loomis retirement should have no user impact but please reach out if you have any concerns or believe you are still using data located on Loomis. See the Loomis Decommission documentation for more information. Apptainer Upgrade on Grace and Ruddle The newest version of Apptainer (v1.1, available now on Ruddle and, after December maintenance, on Grace) comes the ability to create containers without needing elevated privileges (i.e. sudo access). This greatly simplifies the container workflow as you no longer need a separate system to build a container from a definition file. You can simply create a definition file and run the build command. For example, to create a simple toy container from this def file ( lolcow.def ): BootStrap: docker From: ubuntu:20.04 %post apt-get -y update apt-get -y install cowsay lolcat %environment export LC_ALL=C export PATH=/usr/games:$PATH %runscript date | cowsay | lolcat You can run: salloc -p interactive -c 4 apptainer build lolcow.sif lolcow.def This upgrade is live on Ruddle and will be applied on Grace during the December maintenance. For more information, please see the Apptainer documentation site and our docs page on containers . Software Highlights RELION/4.0.0-fosscuda-2020b for cryo-EM/cryo-tomography data processing is now available on Farnam. RELION/3.1 will no longer be updated by the RELION developer. Note that data processed with RELION 4 are not backwards compatible with RELION 3.","title":"2022 12"},{"location":"news/2022-12/#december-2022","text":"","title":"December 2022"},{"location":"news/2022-12/#announcements","text":"","title":"Announcements"},{"location":"news/2022-12/#grace-gibbs-maintenance","text":"The biannual scheduled maintenance for the Grace cluster will be occurring December 6-8. During this time, the cluster will be unavailable. Additionally, the Gibbs filesystem will be unavailable on Farnam and Ruddle on Tuesday, December 6th to deploy a critical firmware upgrade. See the maintenance email announcements for more details.","title":"Grace & Gibbs Maintenance"},{"location":"news/2022-12/#loomis-decommission","text":"The Loomis GPFS filesystem will be retired and unmounted from Grace and Farnam during the Grace December maintenance starting on December 6th. All data except for a few remaining private filesets have already been transferred to other systems (e.g., current software, home, scratch to Palmer and project to Gibbs). The remaining private filesets are being transferred to Gibbs in advance of the maintenance and owners should have received communications accordingly. The only potential user impact of the retirement is on anyone using the older, deprecated software trees. Otherwise, the Loomis retirement should have no user impact but please reach out if you have any concerns or believe you are still using data located on Loomis. See the Loomis Decommission documentation for more information.","title":"Loomis Decommission"},{"location":"news/2022-12/#apptainer-upgrade-on-grace-and-ruddle","text":"The newest version of Apptainer (v1.1, available now on Ruddle and, after December maintenance, on Grace) comes the ability to create containers without needing elevated privileges (i.e. sudo access). This greatly simplifies the container workflow as you no longer need a separate system to build a container from a definition file. You can simply create a definition file and run the build command. For example, to create a simple toy container from this def file ( lolcow.def ): BootStrap: docker From: ubuntu:20.04 %post apt-get -y update apt-get -y install cowsay lolcat %environment export LC_ALL=C export PATH=/usr/games:$PATH %runscript date | cowsay | lolcat You can run: salloc -p interactive -c 4 apptainer build lolcow.sif lolcow.def This upgrade is live on Ruddle and will be applied on Grace during the December maintenance. For more information, please see the Apptainer documentation site and our docs page on containers .","title":"Apptainer Upgrade on Grace and Ruddle"},{"location":"news/2022-12/#software-highlights","text":"RELION/4.0.0-fosscuda-2020b for cryo-EM/cryo-tomography data processing is now available on Farnam. RELION/3.1 will no longer be updated by the RELION developer. Note that data processed with RELION 4 are not backwards compatible with RELION 3.","title":"Software Highlights"},{"location":"news/2023-01/","text":"January 2023 Announcements Open OnDemand VSCode A new OOD app code-server is now available on all clusters, except Milgram (coming in Feb). Code-server allows you to run VSCode in a browser on a compute node. All users who have been running VSCode on a login node via the ssh extension should switch to code-server immediately. The app allows you to use GPUs, to allocate large memories, and to specify a private partition (if you have the access), things you won\u2019t be able to do if you run VSCode on a login node. The app is still in beta version and your feedback is much appreciated. Milgram Transfer Node Milgram now has a node dedicated to data transfers to and from the cluster. To access the node from within Milgram, run ssh transfer from the login node. To upload or download data from Milgram via the transfer node, use the hostname transfer-milgram.hpc.yale.edu (must be on VPN). More information can be found in our Transfer Data documentation . With the addition of the new transfer node, we ask that the login nodes are no longer used for data transfers to limit impact on regular login activities.","title":"2023 01"},{"location":"news/2023-01/#january-2023","text":"","title":"January 2023"},{"location":"news/2023-01/#announcements","text":"","title":"Announcements"},{"location":"news/2023-01/#open-ondemand-vscode","text":"A new OOD app code-server is now available on all clusters, except Milgram (coming in Feb). Code-server allows you to run VSCode in a browser on a compute node. All users who have been running VSCode on a login node via the ssh extension should switch to code-server immediately. The app allows you to use GPUs, to allocate large memories, and to specify a private partition (if you have the access), things you won\u2019t be able to do if you run VSCode on a login node. The app is still in beta version and your feedback is much appreciated.","title":"Open OnDemand VSCode"},{"location":"news/2023-01/#milgram-transfer-node","text":"Milgram now has a node dedicated to data transfers to and from the cluster. To access the node from within Milgram, run ssh transfer from the login node. To upload or download data from Milgram via the transfer node, use the hostname transfer-milgram.hpc.yale.edu (must be on VPN). More information can be found in our Transfer Data documentation . With the addition of the new transfer node, we ask that the login nodes are no longer used for data transfers to limit impact on regular login activities.","title":"Milgram Transfer Node"},{"location":"news/2023-02-milgram/","text":"Milgram Maintenance February 7, 2023 Software Updates Slurm updated to 22.05.7 NVIDIA drivers updated to 525.60.13 Apptainer updated to 1.1.4 Open OnDemand updated to 2.0.29 Hardware Updates Milgram\u2019s network was restructured to reduce latency, and improve resiliency.","title":"2023 02 milgram"},{"location":"news/2023-02-milgram/#milgram-maintenance","text":"February 7, 2023","title":"Milgram Maintenance"},{"location":"news/2023-02-milgram/#software-updates","text":"Slurm updated to 22.05.7 NVIDIA drivers updated to 525.60.13 Apptainer updated to 1.1.4 Open OnDemand updated to 2.0.29","title":"Software Updates"},{"location":"news/2023-02-milgram/#hardware-updates","text":"Milgram\u2019s network was restructured to reduce latency, and improve resiliency.","title":"Hardware Updates"},{"location":"news/2023-02/","text":"February 2023 Announcements Milgram Maintenance The biannual scheduled maintenance for the Milgram cluster will be occurring Feb 7-9. During this time, the cluster will be unavailable. See the Milgram maintenance email announcements for more details. McCleary Launch The YCRC is pleased to announce the launch of the new McCleary HPC cluster. The McCleary HPC cluster will be Yale's first direct-to-chip liquid cooled cluster, moving the YCRC and the Yale research computing community into a more environmentally friendly future. McCleary will be available in a \u201cbeta\u201d phase to Farnam and Ruddle users later on this month. Keep an eye on your email for further announcements about McCleary\u2019s availability.","title":"2023 02"},{"location":"news/2023-02/#february-2023","text":"","title":"February 2023"},{"location":"news/2023-02/#announcements","text":"","title":"Announcements"},{"location":"news/2023-02/#milgram-maintenance","text":"The biannual scheduled maintenance for the Milgram cluster will be occurring Feb 7-9. During this time, the cluster will be unavailable. See the Milgram maintenance email announcements for more details.","title":"Milgram Maintenance"},{"location":"news/2023-02/#mccleary-launch","text":"The YCRC is pleased to announce the launch of the new McCleary HPC cluster. The McCleary HPC cluster will be Yale's first direct-to-chip liquid cooled cluster, moving the YCRC and the Yale research computing community into a more environmentally friendly future. McCleary will be available in a \u201cbeta\u201d phase to Farnam and Ruddle users later on this month. Keep an eye on your email for further announcements about McCleary\u2019s availability.","title":"McCleary Launch"},{"location":"news/2023-03/","text":"March 2023 Announcements McCleary Now Available The new McCleary HPC cluster is now available for active Farnam and Ruddle users\u2013all other researchers who conduct life sciences research can request an account using our Account Request form . Farnam and Ruddle will be retired in mid-2023 so we encourage all users on those clusters to transition their work to McCleary at your earliest convenience. If you see any issues on the new cluster or have any questions, please let us know at hpc@yale.edu . Open OnDemand VSCode Available Everywhere A new OOD app code-server is now available on all YCRC clusters, including Milgram and McCleary. The new code-server allows you to run VSCode in a browser on a compute node. All users who have been running VSCode on a login node via the ssh extension should switch to code-server at their earliest convenience. Unlike VSCode on the login node, the new app also enables you to use GPUs, to allocate large memory nodes, and to specify a private partition (if applicable) The app is still in beta version and your feedback is much appreciated. Software Highlights GPU-enabled LAMMPS ( LAMMPS/23Jun2022-foss-2020b-kokkos-CUDA-11.3.1 ) is now available on Grace. AlphaFold/2.3.1-fosscuda-2020b is now available on Farnam and McCleary.","title":"2023 03"},{"location":"news/2023-03/#march-2023","text":"","title":"March 2023"},{"location":"news/2023-03/#announcements","text":"","title":"Announcements"},{"location":"news/2023-03/#mccleary-now-available","text":"The new McCleary HPC cluster is now available for active Farnam and Ruddle users\u2013all other researchers who conduct life sciences research can request an account using our Account Request form . Farnam and Ruddle will be retired in mid-2023 so we encourage all users on those clusters to transition their work to McCleary at your earliest convenience. If you see any issues on the new cluster or have any questions, please let us know at hpc@yale.edu .","title":"McCleary Now Available"},{"location":"news/2023-03/#open-ondemand-vscode-available-everywhere","text":"A new OOD app code-server is now available on all YCRC clusters, including Milgram and McCleary. The new code-server allows you to run VSCode in a browser on a compute node. All users who have been running VSCode on a login node via the ssh extension should switch to code-server at their earliest convenience. Unlike VSCode on the login node, the new app also enables you to use GPUs, to allocate large memory nodes, and to specify a private partition (if applicable) The app is still in beta version and your feedback is much appreciated.","title":"Open OnDemand VSCode Available Everywhere"},{"location":"news/2023-03/#software-highlights","text":"GPU-enabled LAMMPS ( LAMMPS/23Jun2022-foss-2020b-kokkos-CUDA-11.3.1 ) is now available on Grace. AlphaFold/2.3.1-fosscuda-2020b is now available on Farnam and McCleary.","title":"Software Highlights"},{"location":"news/2023-04/","text":"April 2023 Announcements McCleary in Production Status During March, we have been adding nodes to McCleary, including large memory nodes (4 TiB), GPU nodes and migrating most of the commons nodes from Farnam to McCleary (that are not being retired). Moreover, we have finalized the setup of McCleary and the system is now production stable. Please feel comfortable to migrate your data and workloads from Farnam and Ruddle to McCleary at your earliest convenience. New YCGA Nodes Online on McCleary McCleary now has over 3000 new cores dedicated to YCGA work! We encourage you to test your workloads and prepare to migrate from Ruddle to McCleary at your earliest convenience. More information can be found here . Software Highlights QuantumESPRESSO/7.1-intel-2020b available on Grace RELION/4.0.1 available on McCleary miniconda/23.1.0 available on all clusters scikit-learn/0.23.2-foss-2020b on Grace and McCleary seff-array updated to 0.4 on Grace, McCleary and Milgram","title":"2023 04"},{"location":"news/2023-04/#april-2023","text":"","title":"April 2023"},{"location":"news/2023-04/#announcements","text":"","title":"Announcements"},{"location":"news/2023-04/#mccleary-in-production-status","text":"During March, we have been adding nodes to McCleary, including large memory nodes (4 TiB), GPU nodes and migrating most of the commons nodes from Farnam to McCleary (that are not being retired). Moreover, we have finalized the setup of McCleary and the system is now production stable. Please feel comfortable to migrate your data and workloads from Farnam and Ruddle to McCleary at your earliest convenience.","title":"McCleary in Production Status"},{"location":"news/2023-04/#new-ycga-nodes-online-on-mccleary","text":"McCleary now has over 3000 new cores dedicated to YCGA work! We encourage you to test your workloads and prepare to migrate from Ruddle to McCleary at your earliest convenience. More information can be found here .","title":"New YCGA Nodes Online on McCleary"},{"location":"news/2023-04/#software-highlights","text":"QuantumESPRESSO/7.1-intel-2020b available on Grace RELION/4.0.1 available on McCleary miniconda/23.1.0 available on all clusters scikit-learn/0.23.2-foss-2020b on Grace and McCleary seff-array updated to 0.4 on Grace, McCleary and Milgram","title":"Software Highlights"},{"location":"news/2023-05-23/","text":"Upcoming Maintenances The McCleary cluster will be unavailable from 9am-1pm on Tuesday May 30 while maintenance is performed on the YCGA storage. The Milgram, Grace and McCleary clusters will not be available from 2pm on Monday June 19 until 10am on Wednesday June 21, due to electrical work being performed in the HPC data center. No changes will be made that impact users of the clusters. The regular Grace maintenance that had been scheduled for June 6-8 will be performed on August 15-17. This change is being made in preparation for the upgrade to RHEL 8 on Grace.","title":"2023 05 23"},{"location":"news/2023-05-23/#upcoming-maintenances","text":"The McCleary cluster will be unavailable from 9am-1pm on Tuesday May 30 while maintenance is performed on the YCGA storage. The Milgram, Grace and McCleary clusters will not be available from 2pm on Monday June 19 until 10am on Wednesday June 21, due to electrical work being performed in the HPC data center. No changes will be made that impact users of the clusters. The regular Grace maintenance that had been scheduled for June 6-8 will be performed on August 15-17. This change is being made in preparation for the upgrade to RHEL 8 on Grace.","title":"Upcoming Maintenances"},{"location":"news/2023-05/","text":"May 2023 Announcements Farnam Decommission: June 1, 2023 After many years of supporting productive science, the Farnam cluster will be decommissioned this summer as we transition to the newly deployed McCleary cluster. Logins will be disabled June 1, 2023, which will mark the official end of Farnam\u2019s service. Read-only access to Farnam\u2019s storage system (/gpfs/ysm) will be available on McCleary until July 13, 2023. All data on YSM (that you want to keep) will need to be transferred off YSM, either to non-HPC storage or to McCleary project space by you prior to YSM\u2019s retirement. Ruddle Decommission: July 1, 2023 After many years of serving YCGA, the Ruddle cluster will also be decommissioned this summer as we transition to the newly deployed McCleary cluster. Logins will be disabled July 1, 2023, which will mark the official end of Ruddle\u2019s service. We will be migrating project and sequencing directories from Ruddle to McCleary. However, you are responsible for moving home and scratch data to McCleary before July 1, 2023. Please begin to migrate your data and workloads to McCleary at your earliest convenience and reach out with any questions. McCleary Transition Reminder With our McCleary cluster now in a production stable state, we ask all Farnam users to ensure all home, project and scratch data the group wishes to keep is migrated to the new cluster ahead of the June 1st decommission. As June 1st is the formal retirement of Farnam, compute service charges on McCleary commons partitions will begin at this time. Ruddle users will have until July 1st to access the Ruddle and migrate their home and scratch data as needed. Ruddle users will NOT need to migrate their project directories; those will be automatically transferred to McCleary. As previously established on Ruddle, all jobs in the YCGA partitions will be exempt from compute service charges on the new cluster. For more information visit our McCleary Transition documentation . Software Highlights Libmamba solver for conda 23.1.0+ available on all clusters. Conda installations 23.1.0 and newer are now configured to use the faster environment solving algorithm developed by mamba by default. You can simply use conda install and enjoy the significantly faster solve times. GSEA available in McCleary and Ruddle OOD. Gene Set Enrichment Analysis (GSEA) is now available in McCleary OOD and Ruddle OOD for all users. You can access it by clicking \u201cInteractive Apps'' and then selecting \u201cGSEA\u201d. GSEA is a popular computational method to do functional analysis of multi omics data. Data files for GSEA are not centrally stored on the clusters, so you will need to download them from the GSEA website by yourself. NAG/29-GCCcore-11.2.0 available on Grace AFNI/2023.1.01-foss-2020b-Python-3.8.6 on McCleary","title":"2023 05"},{"location":"news/2023-05/#may-2023","text":"","title":"May 2023"},{"location":"news/2023-05/#announcements","text":"","title":"Announcements"},{"location":"news/2023-05/#farnam-decommission-june-1-2023","text":"After many years of supporting productive science, the Farnam cluster will be decommissioned this summer as we transition to the newly deployed McCleary cluster. Logins will be disabled June 1, 2023, which will mark the official end of Farnam\u2019s service. Read-only access to Farnam\u2019s storage system (/gpfs/ysm) will be available on McCleary until July 13, 2023. All data on YSM (that you want to keep) will need to be transferred off YSM, either to non-HPC storage or to McCleary project space by you prior to YSM\u2019s retirement.","title":"Farnam Decommission: June 1, 2023"},{"location":"news/2023-05/#ruddle-decommission-july-1-2023","text":"After many years of serving YCGA, the Ruddle cluster will also be decommissioned this summer as we transition to the newly deployed McCleary cluster. Logins will be disabled July 1, 2023, which will mark the official end of Ruddle\u2019s service. We will be migrating project and sequencing directories from Ruddle to McCleary. However, you are responsible for moving home and scratch data to McCleary before July 1, 2023. Please begin to migrate your data and workloads to McCleary at your earliest convenience and reach out with any questions.","title":"Ruddle Decommission: July 1, 2023"},{"location":"news/2023-05/#mccleary-transition-reminder","text":"With our McCleary cluster now in a production stable state, we ask all Farnam users to ensure all home, project and scratch data the group wishes to keep is migrated to the new cluster ahead of the June 1st decommission. As June 1st is the formal retirement of Farnam, compute service charges on McCleary commons partitions will begin at this time. Ruddle users will have until July 1st to access the Ruddle and migrate their home and scratch data as needed. Ruddle users will NOT need to migrate their project directories; those will be automatically transferred to McCleary. As previously established on Ruddle, all jobs in the YCGA partitions will be exempt from compute service charges on the new cluster. For more information visit our McCleary Transition documentation .","title":"McCleary Transition Reminder"},{"location":"news/2023-05/#software-highlights","text":"Libmamba solver for conda 23.1.0+ available on all clusters. Conda installations 23.1.0 and newer are now configured to use the faster environment solving algorithm developed by mamba by default. You can simply use conda install and enjoy the significantly faster solve times. GSEA available in McCleary and Ruddle OOD. Gene Set Enrichment Analysis (GSEA) is now available in McCleary OOD and Ruddle OOD for all users. You can access it by clicking \u201cInteractive Apps'' and then selecting \u201cGSEA\u201d. GSEA is a popular computational method to do functional analysis of multi omics data. Data files for GSEA are not centrally stored on the clusters, so you will need to download them from the GSEA website by yourself. NAG/29-GCCcore-11.2.0 available on Grace AFNI/2023.1.01-foss-2020b-Python-3.8.6 on McCleary","title":"Software Highlights"},{"location":"news/2023-06/","text":"June 2023 Announcements McCleary Officially Launches Today marks the official beginning of the McCleary cluster\u2019s service. In addition to compute nodes migrated from Farnam and Ruddle, McCleary features our first set of direct-to-chip liquid cooled (DLC) nodes, moving YCRC into a more environmentally friendly future. McCleary is significantly larger than the Farnam and Ruddle clusters combined. The new DLC compute nodes are able to run faster and with higher CPU density due to their superior cooling system. McCleary is named for Beatrix McCleary Hamburg, who received her medical degree in 1948 and was the first female African American graduate of Yale School of Medicine. Farnam Farewell: June 1, 2023 On the occasion of decommissioning the Farnam cluster on June 1, YCRC would like to acknowledge the profound impact Farnam has had on computing at Yale. Farnam supported biomedical computing at YSM and across the University providing compute resources to hundreds of research groups. Farnam replaced the previous biomedical cluster Louise, and began production in October 2016. Since then, it has run user jobs comprising more than 139 million compute hours. Farnam is replaced by the new cluster McCleary. Please note: Read-only access to Farnam\u2019s storage system (/gpfs/ysm) will be available on McCleary until July 13, 2023. For more information see McCleary transfer documentation . Ruddle Decommission: July 1, 2023 The Ruddle cluster will be decommissioned and access will be disabled July 1, 2023. We will be migrating project and sequencing directories from Ruddle to McCleary. Please note: Users are responsible for moving home and scratch data to McCleary prior to July 1, 2023. For more information and instructions, see our McCleary transfer documentation . Software Highlights R/4.3.0-foss-2020b+ available on all clusters. The newest version of R is now available on Grace, McCleary, and Milgram. This updates nearly 1000 packages and can be used in batch jobs and in RStudio sessions via Open OnDemand. AlphaFold/2.3.2-foss-2020b-CUDA-11.3.1 The latest version of AlphaFold (2.3.2, released in April) has been installed on McCleary and is ready for use. This version fixes a number of bugs and should improve GPU memory usage enabling longer proteins to be studied. LAMMPS/23Jun2022-foss-2020b-kokkos available on McCleary RevBayes/1.2.1-GCC-10.2.0 available on McCleary Spark 3.1.1 (CPU-only and GPU-enabled versions) available on McCleary","title":"2023 06"},{"location":"news/2023-06/#june-2023","text":"","title":"June 2023"},{"location":"news/2023-06/#announcements","text":"","title":"Announcements"},{"location":"news/2023-06/#mccleary-officially-launches","text":"Today marks the official beginning of the McCleary cluster\u2019s service. In addition to compute nodes migrated from Farnam and Ruddle, McCleary features our first set of direct-to-chip liquid cooled (DLC) nodes, moving YCRC into a more environmentally friendly future. McCleary is significantly larger than the Farnam and Ruddle clusters combined. The new DLC compute nodes are able to run faster and with higher CPU density due to their superior cooling system. McCleary is named for Beatrix McCleary Hamburg, who received her medical degree in 1948 and was the first female African American graduate of Yale School of Medicine.","title":"McCleary Officially Launches"},{"location":"news/2023-06/#farnam-farewell-june-1-2023","text":"On the occasion of decommissioning the Farnam cluster on June 1, YCRC would like to acknowledge the profound impact Farnam has had on computing at Yale. Farnam supported biomedical computing at YSM and across the University providing compute resources to hundreds of research groups. Farnam replaced the previous biomedical cluster Louise, and began production in October 2016. Since then, it has run user jobs comprising more than 139 million compute hours. Farnam is replaced by the new cluster McCleary. Please note: Read-only access to Farnam\u2019s storage system (/gpfs/ysm) will be available on McCleary until July 13, 2023. For more information see McCleary transfer documentation .","title":"Farnam Farewell: June 1, 2023"},{"location":"news/2023-06/#ruddle-decommission-july-1-2023","text":"The Ruddle cluster will be decommissioned and access will be disabled July 1, 2023. We will be migrating project and sequencing directories from Ruddle to McCleary. Please note: Users are responsible for moving home and scratch data to McCleary prior to July 1, 2023. For more information and instructions, see our McCleary transfer documentation .","title":"Ruddle Decommission: July 1, 2023"},{"location":"news/2023-06/#software-highlights","text":"R/4.3.0-foss-2020b+ available on all clusters. The newest version of R is now available on Grace, McCleary, and Milgram. This updates nearly 1000 packages and can be used in batch jobs and in RStudio sessions via Open OnDemand. AlphaFold/2.3.2-foss-2020b-CUDA-11.3.1 The latest version of AlphaFold (2.3.2, released in April) has been installed on McCleary and is ready for use. This version fixes a number of bugs and should improve GPU memory usage enabling longer proteins to be studied. LAMMPS/23Jun2022-foss-2020b-kokkos available on McCleary RevBayes/1.2.1-GCC-10.2.0 available on McCleary Spark 3.1.1 (CPU-only and GPU-enabled versions) available on McCleary","title":"Software Highlights"},{"location":"news/2023-07/","text":"July 2023 Announcements Red Hat 8 Test partitions on Grace As Red Hat Enterprise Linux (RHEL) 7 approaches its end of life, we will be upgrading the Grace cluster to RHEL8 during the August 15th-17th maintenance. This will bring Grace in line with McCleary and provide a number of key benefits: continued security patches and support beyond 2023 updated system libraries to better support modern software improved node management system to facilitate the growing number of nodes on Grace shared application tree between McCleary and Grace, which brings software parity between clusters While we have performed extensive testing, both internally and with the new McCleary cluster, we recognize that there are large numbers of custom workflows on Grace that may need to be modified to work with the new operating system. Please note: To enable debugging and testing of workflows ahead of the scheduled maintenance, we have set aside rhel8_day , rhel8_gpu , and rhel8_mpi partitions. You should access them from the rhel8_login node. Two-factor Authentication for McCleary To assure the security of the cluster and associated services, we have implemented two-factor authentication on the McCleary cluster. To simplify the transition, we have collected a set of best-practices and configurations of many of the commonly used access tools, including CyberDuck, MobaXTerm, and WinSCPon, which you can access on our docs page . If you are using other tools and experiencing issues, please contact us for assistance. New GPU Nodes on McCleary and Grace We have installed new GPU nodes for McCleary and Grace, dramatically increasing the number of GPUs available on both clusters. McCleary has 14 new nodes (56 GPUs) added to the gpu partition and six nodes (24 GPUs) added to pi_cryoem . Grace has 12 new nodes, available in the rhel8_gpu partition. Each of the new nodes contains 4 NVIDIA A5000 GPUs , with 24GB of on-board VRAM and PCIe4 connection to improve data-transport time. Software Highlights MATLAB/2023a available on all clusters Beast/2.7.4-GCC-12.2.0 available on McCleary AFNI/2023.1.07-foss-2020b available on McCleary FSL 6.0.5.1 (CPU-only and GPU-enabled versions) available on McCleary","title":"2023 07"},{"location":"news/2023-07/#july-2023","text":"","title":"July 2023"},{"location":"news/2023-07/#announcements","text":"","title":"Announcements"},{"location":"news/2023-07/#red-hat-8-test-partitions-on-grace","text":"As Red Hat Enterprise Linux (RHEL) 7 approaches its end of life, we will be upgrading the Grace cluster to RHEL8 during the August 15th-17th maintenance. This will bring Grace in line with McCleary and provide a number of key benefits: continued security patches and support beyond 2023 updated system libraries to better support modern software improved node management system to facilitate the growing number of nodes on Grace shared application tree between McCleary and Grace, which brings software parity between clusters While we have performed extensive testing, both internally and with the new McCleary cluster, we recognize that there are large numbers of custom workflows on Grace that may need to be modified to work with the new operating system. Please note: To enable debugging and testing of workflows ahead of the scheduled maintenance, we have set aside rhel8_day , rhel8_gpu , and rhel8_mpi partitions. You should access them from the rhel8_login node.","title":"Red Hat 8 Test partitions on Grace"},{"location":"news/2023-07/#two-factor-authentication-for-mccleary","text":"To assure the security of the cluster and associated services, we have implemented two-factor authentication on the McCleary cluster. To simplify the transition, we have collected a set of best-practices and configurations of many of the commonly used access tools, including CyberDuck, MobaXTerm, and WinSCPon, which you can access on our docs page . If you are using other tools and experiencing issues, please contact us for assistance.","title":"Two-factor Authentication for McCleary"},{"location":"news/2023-07/#new-gpu-nodes-on-mccleary-and-grace","text":"We have installed new GPU nodes for McCleary and Grace, dramatically increasing the number of GPUs available on both clusters. McCleary has 14 new nodes (56 GPUs) added to the gpu partition and six nodes (24 GPUs) added to pi_cryoem . Grace has 12 new nodes, available in the rhel8_gpu partition. Each of the new nodes contains 4 NVIDIA A5000 GPUs , with 24GB of on-board VRAM and PCIe4 connection to improve data-transport time.","title":"New GPU Nodes on McCleary and Grace"},{"location":"news/2023-07/#software-highlights","text":"MATLAB/2023a available on all clusters Beast/2.7.4-GCC-12.2.0 available on McCleary AFNI/2023.1.07-foss-2020b available on McCleary FSL 6.0.5.1 (CPU-only and GPU-enabled versions) available on McCleary","title":"Software Highlights"},{"location":"news/2023-08-grace/","text":"Grace Maintenance August 15-17, 2023 Software Updates Red Hat Enterprise Linux (RHEL) updated to 8.8 Slurm updated to 22.05.9 NVIDIA drivers updated to 535.86.10 Apptainer updated to 1.2.2 Open OnDemand updated to 2.0.32 Upgrade to Red Hat 8 As part of this maintenance, the operating system on Grace has been upgraded to Red Hat 8. A new unified software tree that is shared with the McCleary cluster has been created. The ssh host keys for Grace's login nodes were changed during the maintenance, which will result in a \"WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!\" error when you attempt to login. To access the cluster again, first remove the old host keys with the following command (if accessing the cluster via command line): ssh-keygen -R grace.hpc.yale.edu If you are using a GUI, such as MobaXterm, you will need to manually edit your known host file and remove the list related to Grace. For MobaXterm, this file is located (by default) in Documents/MobaXterm/home/.ssh . Then attempt a new login and accept the new host key. New Open OnDemand (Web Portal) URL The new URL for the Grace Open OnDemand web portal is https://ood-grace.ycrc.yale.edu .","title":"2023 08 grace"},{"location":"news/2023-08-grace/#grace-maintenance","text":"August 15-17, 2023","title":"Grace Maintenance"},{"location":"news/2023-08-grace/#software-updates","text":"Red Hat Enterprise Linux (RHEL) updated to 8.8 Slurm updated to 22.05.9 NVIDIA drivers updated to 535.86.10 Apptainer updated to 1.2.2 Open OnDemand updated to 2.0.32","title":"Software Updates"},{"location":"news/2023-08-grace/#upgrade-to-red-hat-8","text":"As part of this maintenance, the operating system on Grace has been upgraded to Red Hat 8. A new unified software tree that is shared with the McCleary cluster has been created. The ssh host keys for Grace's login nodes were changed during the maintenance, which will result in a \"WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!\" error when you attempt to login. To access the cluster again, first remove the old host keys with the following command (if accessing the cluster via command line): ssh-keygen -R grace.hpc.yale.edu If you are using a GUI, such as MobaXterm, you will need to manually edit your known host file and remove the list related to Grace. For MobaXterm, this file is located (by default) in Documents/MobaXterm/home/.ssh . Then attempt a new login and accept the new host key.","title":"Upgrade to Red Hat 8"},{"location":"news/2023-08-grace/#new-open-ondemand-web-portal-url","text":"The new URL for the Grace Open OnDemand web portal is https://ood-grace.ycrc.yale.edu .","title":"New Open OnDemand (Web Portal) URL"},{"location":"news/2023-08-milgram/","text":"Milgram Maintenance August 22, 2023_ Software Updates Slurm updated to 22.05.9 NVIDIA drivers updated to 535.86.10 Apptainer updated to 1.2.42 Open OnDemand updated to 2.0.32 Multi-Factor Authentication Multi-factor authentication is now required for ssh for all users on Milgram. For most usage, this additional step is minimally invasive and makes our clusters much more secure. However, for users who use graphical transfer tools such as Cyberduck, please see our MFA transfer documentation .","title":"2023 08 milgram"},{"location":"news/2023-08-milgram/#milgram-maintenance","text":"August 22, 2023_","title":"Milgram Maintenance"},{"location":"news/2023-08-milgram/#software-updates","text":"Slurm updated to 22.05.9 NVIDIA drivers updated to 535.86.10 Apptainer updated to 1.2.42 Open OnDemand updated to 2.0.32","title":"Software Updates"},{"location":"news/2023-08-milgram/#multi-factor-authentication","text":"Multi-factor authentication is now required for ssh for all users on Milgram. For most usage, this additional step is minimally invasive and makes our clusters much more secure. However, for users who use graphical transfer tools such as Cyberduck, please see our MFA transfer documentation .","title":"Multi-Factor Authentication"},{"location":"news/2023-08/","text":"August 2023 Announcements Ruddle Farewell: July 24, 2023 On the occasion of decommissioning the Ruddle cluster on July 24, the Yale Center for Genome Analysis (YCGA) and the Yale Center for Research Computing (YCRC) would like to acknowledge the profound impact Ruddle has had on computing at Yale. Ruddle provided the compute resources for YCGA's high throughput sequencing and supported genomic computing for hundreds of research groups at YSM and across the University. In February 2016, Ruddle replaced the previous biomedical cluster BulldogN. Since then, it has run more than 24 million user jobs comprising more than 73 million compute hours. Funding for Ruddle came from NIH grant 1S10OD018521-01, with Shrikant Mane as PI. Ruddle is replaced by a dedicated partition and storage on the new McCleary cluster, which were funded by NIH grant 1S10OD030363-01A1, also awarded to Dr. Mane. Upcoming Grace Maintenance: August 15-17, 2023 Scheduled maintenance will be performed on the Grace cluster starting on Tuesday, August 15, 2023, at 8:00 am. Maintenance is expected to be completed by the end of day, Thursday, August 17, 2023. Upcoming Milgram Maintenance: August 22-24, 2023 Scheduled maintenance will be performed on the Milgram cluster starting on Tuesday, August 22, 2023, at 8:00 am. Maintenance is expected to be completed by the end of day, Thursday, August 24, 2023. Grace Operating System Upgrade As Red Hat Enterprise Linux (RHEL) 7 approaches its end of life, we will be upgrading the Grace cluster from RHEL7 to RHEL8 during the August maintenance window. This will bring Grace in line with McCleary and provide a number of key benefits: continued security patches and support beyond 2023 updated system libraries to better support modern software improved node management system to facilitate the growing number of nodes on Grace shared application tree between McCleary and Grace, which brings software parity between clusters Three test partitions are available ( rhel8_day , rhel8_gpu , and rhel8_mpi ) for use in debugging workflows before the upgrade. These partitions should be accessed from the rhel8_login node. Software Highlights Julia/1.9.2-linux-x86_64 available on Grace Kraken2/2.1.3-gompi-2020b available on McCleary QuantumESPRESSO/7.0-intel-2020b available on Grace","title":"2023 08"},{"location":"news/2023-08/#august-2023","text":"","title":"August 2023"},{"location":"news/2023-08/#announcements","text":"","title":"Announcements"},{"location":"news/2023-08/#ruddle-farewell-july-24-2023","text":"On the occasion of decommissioning the Ruddle cluster on July 24, the Yale Center for Genome Analysis (YCGA) and the Yale Center for Research Computing (YCRC) would like to acknowledge the profound impact Ruddle has had on computing at Yale. Ruddle provided the compute resources for YCGA's high throughput sequencing and supported genomic computing for hundreds of research groups at YSM and across the University. In February 2016, Ruddle replaced the previous biomedical cluster BulldogN. Since then, it has run more than 24 million user jobs comprising more than 73 million compute hours. Funding for Ruddle came from NIH grant 1S10OD018521-01, with Shrikant Mane as PI. Ruddle is replaced by a dedicated partition and storage on the new McCleary cluster, which were funded by NIH grant 1S10OD030363-01A1, also awarded to Dr. Mane.","title":"Ruddle Farewell: July 24, 2023"},{"location":"news/2023-08/#upcoming-grace-maintenance-august-15-17-2023","text":"Scheduled maintenance will be performed on the Grace cluster starting on Tuesday, August 15, 2023, at 8:00 am. Maintenance is expected to be completed by the end of day, Thursday, August 17, 2023.","title":"Upcoming Grace Maintenance: August 15-17, 2023"},{"location":"news/2023-08/#upcoming-milgram-maintenance-august-22-24-2023","text":"Scheduled maintenance will be performed on the Milgram cluster starting on Tuesday, August 22, 2023, at 8:00 am. Maintenance is expected to be completed by the end of day, Thursday, August 24, 2023.","title":"Upcoming Milgram Maintenance: August 22-24, 2023"},{"location":"news/2023-08/#grace-operating-system-upgrade","text":"As Red Hat Enterprise Linux (RHEL) 7 approaches its end of life, we will be upgrading the Grace cluster from RHEL7 to RHEL8 during the August maintenance window. This will bring Grace in line with McCleary and provide a number of key benefits: continued security patches and support beyond 2023 updated system libraries to better support modern software improved node management system to facilitate the growing number of nodes on Grace shared application tree between McCleary and Grace, which brings software parity between clusters Three test partitions are available ( rhel8_day , rhel8_gpu , and rhel8_mpi ) for use in debugging workflows before the upgrade. These partitions should be accessed from the rhel8_login node.","title":"Grace Operating System Upgrade"},{"location":"news/2023-08/#software-highlights","text":"Julia/1.9.2-linux-x86_64 available on Grace Kraken2/2.1.3-gompi-2020b available on McCleary QuantumESPRESSO/7.0-intel-2020b available on Grace","title":"Software Highlights"},{"location":"news/2023-09/","text":"September 2023 Announcements Grace RHEL8 Upgrade As Red Hat Enterprise Linux (RHEL) 7 approaches its end of life, we upgraded the Grace cluster from RHEL7 to RHEL8 during the August maintenance window. This brings Grace in line with McCleary and provide a number of key benefits: continued security patches and support beyond 2023 updated system libraries to better support modern software improved node management system to facilitate the growing number of nodes on Grace shared application tree between McCleary and Grace, which brings software parity between clusters There are a small number of compute nodes in the legacy partition with the old RHEL7 operating system installed for workloads that still need to be migrated. We expect to retire this partition during the Grace December 2023 maintenance. Please contact us if you need help upgrading to RHEL8 in the coming months. Grace Old Software Deprecation The RHEL7 application module tree ( /gpfs/loomis/apps/avx ) is now deprecated and will be removed from the default module environment during the Grace December maintenance. The software will still be available on Grace, but YCRC will no longer provide support for those old packages after December. If you are using a software package in that tree that is not yet installed into the new shared module tree, please let us know as soon as possible so we can help avoid any disruptions. Software Highlights intel/2022b toolchain is now available on Grace and McCleary MKL 2022.2.1 Intel MPI 2022.2.1 Intel Compilers 2022.2.1 foss/2022b toolchain is now available on Grace and McCleary FFTW 3.3.10 ScaLAPACK 2.2.0 OpenMPI 4.1.4 GCC 12.2.0","title":"2023 09"},{"location":"news/2023-09/#september-2023","text":"","title":"September 2023"},{"location":"news/2023-09/#announcements","text":"","title":"Announcements"},{"location":"news/2023-09/#grace-rhel8-upgrade","text":"As Red Hat Enterprise Linux (RHEL) 7 approaches its end of life, we upgraded the Grace cluster from RHEL7 to RHEL8 during the August maintenance window. This brings Grace in line with McCleary and provide a number of key benefits: continued security patches and support beyond 2023 updated system libraries to better support modern software improved node management system to facilitate the growing number of nodes on Grace shared application tree between McCleary and Grace, which brings software parity between clusters There are a small number of compute nodes in the legacy partition with the old RHEL7 operating system installed for workloads that still need to be migrated. We expect to retire this partition during the Grace December 2023 maintenance. Please contact us if you need help upgrading to RHEL8 in the coming months.","title":"Grace RHEL8 Upgrade"},{"location":"news/2023-09/#grace-old-software-deprecation","text":"The RHEL7 application module tree ( /gpfs/loomis/apps/avx ) is now deprecated and will be removed from the default module environment during the Grace December maintenance. The software will still be available on Grace, but YCRC will no longer provide support for those old packages after December. If you are using a software package in that tree that is not yet installed into the new shared module tree, please let us know as soon as possible so we can help avoid any disruptions.","title":"Grace Old Software Deprecation"},{"location":"news/2023-09/#software-highlights","text":"intel/2022b toolchain is now available on Grace and McCleary MKL 2022.2.1 Intel MPI 2022.2.1 Intel Compilers 2022.2.1 foss/2022b toolchain is now available on Grace and McCleary FFTW 3.3.10 ScaLAPACK 2.2.0 OpenMPI 4.1.4 GCC 12.2.0","title":"Software Highlights"},{"location":"news/2023-10-mccleary/","text":"McCleary Maintenance October 3-5, 2023_ Software Updates Slurm updated to 23.02.5 NVIDIA drivers updated to 535.104.12 Lmod updated to 8.7.30 Apptainer updated to 1.2.3 System Python updated to 3.11","title":"2023 10 mccleary"},{"location":"news/2023-10-mccleary/#mccleary-maintenance","text":"October 3-5, 2023_","title":"McCleary Maintenance"},{"location":"news/2023-10-mccleary/#software-updates","text":"Slurm updated to 23.02.5 NVIDIA drivers updated to 535.104.12 Lmod updated to 8.7.30 Apptainer updated to 1.2.3 System Python updated to 3.11","title":"Software Updates"},{"location":"news/2023-10/","text":"October 2023 Announcements McCleary Maintenance The biannual scheduled maintenance for the McCleary cluster will be occurring Oct 3-5. During this time, the cluster will be unavailable. See the McCleary maintenance email announcements for more details. Interactive jobs on day on McCleary Interactive jobs are now allowed to be run on the day partition on McCleary. Note you are still limited to 4 interactive-style jobs of any kind (salloc or OpenOnDemand) at one time. Additional instances will be rejected until you delete older open instances. For OnDemand jobs, closing the window does not terminate the interactive app job. To terminate the job, click the \"Delete\" button in your \"My Interactive Apps\" page in the web portal. \"Papermill\" for Jupyter Command-Line Execution Many scientific workflows start as interactive Jupyter notebooks, and our Open OnDemand portal has dramatically simplified deploying these notebooks on cluster resources. However, the step from running notebooks interactively to running jobs as a batch script can be challenging and is often a barrier to migrating to using sbatch to run workflows non-interactively. To help solve this problem, there are a handful of utilities that can execute a notebook as if you were manually hitting \"shift-Enter\" for each cell. Of note is Papermill which provides a powerful set of tools to bridge between interactive and batch-mode computing. To get started, install papermill into your Conda environments: module load miniconda conda install papermill Then you can simply evaluate a notebook, preserving figures and output inside the notebook, like this: papermill /path/to/notebook.ipynb This can be run inside a batch job that might look like this: #!/bin/bash #SBATCH -p day #SBATCH -c 1 #SBATCH -t 6:00:00 module purge miniconda conda activate my_env papermill /path/to/notebook.ipynb Variables can also be parameterized and passed in as command-line options so that you can run multiple copies simultaneously with different input variables. For more information see the [Papermill docs pages](https://papermill.readthedocs.io/.","title":"2023 10"},{"location":"news/2023-10/#october-2023","text":"","title":"October 2023"},{"location":"news/2023-10/#announcements","text":"","title":"Announcements"},{"location":"news/2023-10/#mccleary-maintenance","text":"The biannual scheduled maintenance for the McCleary cluster will be occurring Oct 3-5. During this time, the cluster will be unavailable. See the McCleary maintenance email announcements for more details.","title":"McCleary Maintenance"},{"location":"news/2023-10/#interactive-jobs-on-day-on-mccleary","text":"Interactive jobs are now allowed to be run on the day partition on McCleary. Note you are still limited to 4 interactive-style jobs of any kind (salloc or OpenOnDemand) at one time. Additional instances will be rejected until you delete older open instances. For OnDemand jobs, closing the window does not terminate the interactive app job. To terminate the job, click the \"Delete\" button in your \"My Interactive Apps\" page in the web portal.","title":"Interactive jobs on day on McCleary"},{"location":"news/2023-10/#papermill-for-jupyter-command-line-execution","text":"Many scientific workflows start as interactive Jupyter notebooks, and our Open OnDemand portal has dramatically simplified deploying these notebooks on cluster resources. However, the step from running notebooks interactively to running jobs as a batch script can be challenging and is often a barrier to migrating to using sbatch to run workflows non-interactively. To help solve this problem, there are a handful of utilities that can execute a notebook as if you were manually hitting \"shift-Enter\" for each cell. Of note is Papermill which provides a powerful set of tools to bridge between interactive and batch-mode computing. To get started, install papermill into your Conda environments: module load miniconda conda install papermill Then you can simply evaluate a notebook, preserving figures and output inside the notebook, like this: papermill /path/to/notebook.ipynb This can be run inside a batch job that might look like this: #!/bin/bash #SBATCH -p day #SBATCH -c 1 #SBATCH -t 6:00:00 module purge miniconda conda activate my_env papermill /path/to/notebook.ipynb Variables can also be parameterized and passed in as command-line options so that you can run multiple copies simultaneously with different input variables. For more information see the [Papermill docs pages](https://papermill.readthedocs.io/.","title":"\"Papermill\" for Jupyter Command-Line Execution"},{"location":"news/2023-11/","text":"November 2023 Announcements Globus Available on Milgram Globus is now available to move data in and out from Milgram. For increased security, Globus only has access to a staging directory ( /gpfs/milgram/globus/$NETID ) where you can temporarily store data. Please see our documentation page for more information and reach out to hpc@yale.edu if you have any questions. RStudio Server Updates RStudio Server on the OpenDemand web portal for all clusters now starts an R session in a clean environment and will not save the session when you finish. If you want to save your session and reuse it next time, please select the checkbox \"Start R from your last saved session\".","title":"2023 11"},{"location":"news/2023-11/#november-2023","text":"","title":"November 2023"},{"location":"news/2023-11/#announcements","text":"","title":"Announcements"},{"location":"news/2023-11/#globus-available-on-milgram","text":"Globus is now available to move data in and out from Milgram. For increased security, Globus only has access to a staging directory ( /gpfs/milgram/globus/$NETID ) where you can temporarily store data. Please see our documentation page for more information and reach out to hpc@yale.edu if you have any questions.","title":"Globus Available on Milgram"},{"location":"news/2023-11/#rstudio-server-updates","text":"RStudio Server on the OpenDemand web portal for all clusters now starts an R session in a clean environment and will not save the session when you finish. If you want to save your session and reuse it next time, please select the checkbox \"Start R from your last saved session\".","title":"RStudio Server Updates"},{"location":"resources/","text":"Training & Other Resources The YCRC offers training sessions in a wide range of topics related to research computing taught by YCRC staff, HPC experts at national HPC centers or our vendor partners.","title":"Overview"},{"location":"resources/#training-other-resources","text":"The YCRC offers training sessions in a wide range of topics related to research computing taught by YCRC staff, HPC experts at national HPC centers or our vendor partners.","title":"Training & Other Resources"},{"location":"resources/glossary/","text":"Glossary To help clarify the way we refer to certain terms in our user documentation, here is a brief list of some of the words that regularly come up in our documents. Please reach out to us at hpc@yale.edu if there are any other words that need to be added. Account - used to authenticate and grant permission to access resources Account (Slurm) - an accounting mechanism to keep track of a group's computing usage Activate - making something operational Array - a data structure across a series of memory locations consisting of elements organized in an index Array (job) - a series of jobs that all request the same resources and run the same batch script Array Task ID - a unique sequential number with an appended number that refers to an individual task within the set of submitted jobs Channel - Community-led collections of packages created by a group or lab installed with conda to allow for a homogenous environment across systems CLI - Command Line Interface processes commands to a computer program in the form of lines of text in a window Cluster - a set of computers, called nodes) networked together so nodes can perform the tasks facilitated by a scheduling software Command - a specific order from a computer to execute a service with either an application or the operating system Compute Node - the nodes that work runs on to perform computational work Container - A stack of software, libraries and operating system that is independent of the host computer and can be accessed on other computers Container Image - Self-contained read-only files used to run applications CPU - Central Processing Units are the components of a system that perform basic operations and exchange data with the system\u2019s memory (also known as a processor) Data - items of information collected together for reference or analysis Database - a collection of structured data held within a computer Deactivate - making something de-operational Environment - a collection of hardware, software, data storage and networks that work together in facilitating the processing and exchange of information Extension - Suffix at the end of a filename to indicate the file type Fileset - a section of a storage device that is given a designated purpose Filesystem - a process that manages how and where data is stored Flag - (see Options) GPU - Graphics Processing Units are specialized circuits designed to rapidly manipulate memory and create images in a frame buffer for a displayed output GridFTP - an extension of the Fire Transfer Protocol for grid computing that allows users to transfer and save data on a different account such as Google Drive or other off network memory Group - a collection of users who can all be given the same permissions on a system GUI - Graphical User Interface allows users to interact with devices and applications through a visual window that can commonly display icons and predetermined fields Hardware - the physical parts of a computer Host - (ie. Host Computer) A device connected to a computer network that offers resources, services and applications to users on the network Image - (See Container Image) Index - a method of sorting data by creating keywords or a listing of data Interface - a boundary across which two or more computer system components can exchange information Job - a unit of work given to an operating system by a scheduler Job Array - a way to submit multiple similar jobs by associating each subjob with an index value based on an array task id Key - a variable value applied using an algorithm to a block of unencrypted text to produce encrypted text Load - transfer a program or data into memory or into the CPU Login Node - a node that users log in on to access the cluster Memory - (see RAM) Metadata - A set of data that describes and gives basic information about other data Module - a number of distinct but interrelated units that build up or into a program MPI - Message Passing Interface is a standardized and portable message-passing standard designed to function on parallel computing architectures Multiprocessing - the ability to operate more than one task simultaneously on the same program across two or more processors in a computer Node - a server in the cluster Option - a single letter or full word that modifies the behavior of a command in a predetermined way (also known as a flag or switch) Package - a collection of hardware and software needed to create a working system Parallel - (ex. Computing/Programming) Architecture in which several processes are carried out simultaneously across smaller, independent parts Partition - a section of a storage device that is given a designated purpose Partition (Slurm) - a collection of compute nodes available via the scheduler Path - A string of characters used to identify locations throughout a directory structure Pane - (Associate with window) A subdivision within a window where an independent terminal can run simultaneously alongside another terminal Processor - (see CPU) Queue - a sequence of objects arranged according to priority waiting to be processed RAM - Random Access Memory, also known as \"Memory\" can be read and changed in any order and is typically used to to store working data Reproducibility - the ability to execute the same results across multiple systems by different individuals using the same data Scheduler - the software used to assign resources to a job for tasks Scheduling - the act of assigning resources to a task through a software product Session - a temporary information exchange between two or more devices SSH - secure shell is a cryptographic network protocol for operating network services securely over an unsecured network Software - a collection of data and instructions that tell a computer how to operate Switch - (see Options) System - a set of integrated hardware and software that input, output, process, and store data and information Task ID - a unique sequential number used to refer to a task Terminal - Referring to a terminal program, a text-based interface for typing commands Toolchain - a set of tools performing individual actions used in delivering an operation Unload - remove a program or data from memory or out of the CPU User - a person interacting and utilizing a computing service Variable - assigned and referenced data values that can be called within a program and changed depending on how the program runs Window - (Associate with pane) the whole screen being displayed, containing subdivisions, or panes, that can run independent terminals alongside each other","title":"Glossary"},{"location":"resources/glossary/#glossary","text":"To help clarify the way we refer to certain terms in our user documentation, here is a brief list of some of the words that regularly come up in our documents. Please reach out to us at hpc@yale.edu if there are any other words that need to be added. Account - used to authenticate and grant permission to access resources Account (Slurm) - an accounting mechanism to keep track of a group's computing usage Activate - making something operational Array - a data structure across a series of memory locations consisting of elements organized in an index Array (job) - a series of jobs that all request the same resources and run the same batch script Array Task ID - a unique sequential number with an appended number that refers to an individual task within the set of submitted jobs Channel - Community-led collections of packages created by a group or lab installed with conda to allow for a homogenous environment across systems CLI - Command Line Interface processes commands to a computer program in the form of lines of text in a window Cluster - a set of computers, called nodes) networked together so nodes can perform the tasks facilitated by a scheduling software Command - a specific order from a computer to execute a service with either an application or the operating system Compute Node - the nodes that work runs on to perform computational work Container - A stack of software, libraries and operating system that is independent of the host computer and can be accessed on other computers Container Image - Self-contained read-only files used to run applications CPU - Central Processing Units are the components of a system that perform basic operations and exchange data with the system\u2019s memory (also known as a processor) Data - items of information collected together for reference or analysis Database - a collection of structured data held within a computer Deactivate - making something de-operational Environment - a collection of hardware, software, data storage and networks that work together in facilitating the processing and exchange of information Extension - Suffix at the end of a filename to indicate the file type Fileset - a section of a storage device that is given a designated purpose Filesystem - a process that manages how and where data is stored Flag - (see Options) GPU - Graphics Processing Units are specialized circuits designed to rapidly manipulate memory and create images in a frame buffer for a displayed output GridFTP - an extension of the Fire Transfer Protocol for grid computing that allows users to transfer and save data on a different account such as Google Drive or other off network memory Group - a collection of users who can all be given the same permissions on a system GUI - Graphical User Interface allows users to interact with devices and applications through a visual window that can commonly display icons and predetermined fields Hardware - the physical parts of a computer Host - (ie. Host Computer) A device connected to a computer network that offers resources, services and applications to users on the network Image - (See Container Image) Index - a method of sorting data by creating keywords or a listing of data Interface - a boundary across which two or more computer system components can exchange information Job - a unit of work given to an operating system by a scheduler Job Array - a way to submit multiple similar jobs by associating each subjob with an index value based on an array task id Key - a variable value applied using an algorithm to a block of unencrypted text to produce encrypted text Load - transfer a program or data into memory or into the CPU Login Node - a node that users log in on to access the cluster Memory - (see RAM) Metadata - A set of data that describes and gives basic information about other data Module - a number of distinct but interrelated units that build up or into a program MPI - Message Passing Interface is a standardized and portable message-passing standard designed to function on parallel computing architectures Multiprocessing - the ability to operate more than one task simultaneously on the same program across two or more processors in a computer Node - a server in the cluster Option - a single letter or full word that modifies the behavior of a command in a predetermined way (also known as a flag or switch) Package - a collection of hardware and software needed to create a working system Parallel - (ex. Computing/Programming) Architecture in which several processes are carried out simultaneously across smaller, independent parts Partition - a section of a storage device that is given a designated purpose Partition (Slurm) - a collection of compute nodes available via the scheduler Path - A string of characters used to identify locations throughout a directory structure Pane - (Associate with window) A subdivision within a window where an independent terminal can run simultaneously alongside another terminal Processor - (see CPU) Queue - a sequence of objects arranged according to priority waiting to be processed RAM - Random Access Memory, also known as \"Memory\" can be read and changed in any order and is typically used to to store working data Reproducibility - the ability to execute the same results across multiple systems by different individuals using the same data Scheduler - the software used to assign resources to a job for tasks Scheduling - the act of assigning resources to a task through a software product Session - a temporary information exchange between two or more devices SSH - secure shell is a cryptographic network protocol for operating network services securely over an unsecured network Software - a collection of data and instructions that tell a computer how to operate Switch - (see Options) System - a set of integrated hardware and software that input, output, process, and store data and information Task ID - a unique sequential number used to refer to a task Terminal - Referring to a terminal program, a text-based interface for typing commands Toolchain - a set of tools performing individual actions used in delivering an operation Unload - remove a program or data from memory or out of the CPU User - a person interacting and utilizing a computing service Variable - assigned and referenced data values that can be called within a program and changed depending on how the program runs Window - (Associate with pane) the whole screen being displayed, containing subdivisions, or panes, that can run independent terminals alongside each other","title":"Glossary"},{"location":"resources/intro_to_hpc_tutorial/","text":"Introduction to HPC Tutorials To begin, access the cluster through Open OnDemand and open the shell window. This can be done by by going to the top navigation bar, clicking on the Clusters tab and selecting the Shell Access button. Once the new shell window is loaded, you will be able use this interface like your local command interface. Now that you're setup in a shell window, you can begin the first task like so: Part 1: Interactive Jobs Inside of the shell window, start an interactive job with the default resource requests. Once you are allocated space off the login node, load the Miniconda module and create a Conda environment for this exercise. This can be done like so: # Ask for an interactive session salloc # Load the Miniconda module module load miniconda # Create a test environment with Conda that contains the default Python version conda create -yn tutorial_env python jupyter # Activate the new environment conda activate tutorial_env # Deactivate the new environment conda deactivate # Exit your interactive job to free the resources exit Part 2: Batch Jobs Going off of the environment we created in part 1 , navigate to the Files tab in OOD and select your project directory. Click the '+ New File' button and name the file message_decode_tutorial.py . Once the new file is created, open this file in the OOD text editor by going to the file, clicking the three-dot more button, and selecting edit in the dropdown menu like so: Once the text editor is open, paste this python example inside of the file: def message_decode_tutorial ( message , c ): holder = \"\" for letter in range ( 0 , len ( message )): if ( letter + 1 ) % c == 0 : holder = holder + message [ letter ] return holder message = 'gT baZu lWp Kjv uXyeS nViU fdlH gJr KaIc tBpl Sy \\ Jox MtUl Qbm kGTp UdHe hdLJf Nu IcPRu XhBtDjf TsmPf \\ o DoKfw xP qyTcJ tUpYrv Pk ArBCf Wrtp JfRcX JqPdKLC' cypher = message_decode_tutorial ( message , 10 ) with open ( '/home/NETID/decoded_example.txt' , 'w+' ) as output : print ( cypher , file = output ) This python function takes a given message and parses through it against the parameters of a cypher, which in our case writes every 10th letter. Make sure to replace the placeholder 'NETID' in the second to last line with your personal NetID. This will allow your output file to go into your homespace. From here, navigate back to your project directory and select the '+ New File' button, this time naming it batch_tutorial.sh . Using Slurm options to define resource requests for this job, paste the following code inside of this file like you did the previous file: #!/bin/bash #SBATCH --job-name=message_decode_tutorial #SBATCH --time=1:00 #SBATCH --mem-per-cpu=2MB #SBATCH --mail-type=ALL module load miniconda source activate tutorial_env python message_decode_tutorial.py Because the partition isn't specified for this job, it will run on the cluster's default partition. From there, you can go back to the shell window, navigate to your project directory and run the sbatch command to begin your batch job like so: # Navigate to the project directory cd project # Use Slurm to start a batch job sbatch batch_tutorial.sh Once you receive an email saying the job is complete, navigate to your home-space through the shell window on Open OnDemand. Within this directory you will find a file called decoded_example.txt . To quickly see the file contents, use the cat command to print the file's contents on the standard output, revealing the decoded message like so: # Navigate to your homespace (replacing NETID with your netID) cd /home/NETID # Print out the decoded message cat decoded_example.txt Part 3: Interactive Apps on OOD Now that you have completed both an interactive and batch job, try using Jupyter Notebooks on Open OnDemand for your work. This can be done in the shell window like so: # Purge any loaded modules module purge # Build your environment dropdown tab on OOD ycrc_conda_env.sh update Now that this is completed, return to the Open OnDemand homepage and select the Interactive Apps dropdown tab in the top navigation bar. From there you can select Jupyter and load the job submission request form. To select your resources, make sure to consult our Slurm documentation as well as the specific cluster's partition information to ensure you're selecting the appropriate resources for your job's needs. Once the session is submitted and running, connect to the notebook and navigate to your working directory. From there you can either select the Upload button to upload an existing Jupyter notebook file or select the New button to create a new notebook. To help with this, make sure to look over the YCRC Jupyter Notebook information as well as Jupyter's User Interface page .","title":"Introduction to HPC Tutorials"},{"location":"resources/intro_to_hpc_tutorial/#introduction-to-hpc-tutorials","text":"To begin, access the cluster through Open OnDemand and open the shell window. This can be done by by going to the top navigation bar, clicking on the Clusters tab and selecting the Shell Access button. Once the new shell window is loaded, you will be able use this interface like your local command interface. Now that you're setup in a shell window, you can begin the first task like so:","title":"Introduction to HPC Tutorials"},{"location":"resources/intro_to_hpc_tutorial/#part-1-interactive-jobs","text":"Inside of the shell window, start an interactive job with the default resource requests. Once you are allocated space off the login node, load the Miniconda module and create a Conda environment for this exercise. This can be done like so: # Ask for an interactive session salloc # Load the Miniconda module module load miniconda # Create a test environment with Conda that contains the default Python version conda create -yn tutorial_env python jupyter # Activate the new environment conda activate tutorial_env # Deactivate the new environment conda deactivate # Exit your interactive job to free the resources exit","title":"Part 1: Interactive Jobs"},{"location":"resources/intro_to_hpc_tutorial/#part-2-batch-jobs","text":"Going off of the environment we created in part 1 , navigate to the Files tab in OOD and select your project directory. Click the '+ New File' button and name the file message_decode_tutorial.py . Once the new file is created, open this file in the OOD text editor by going to the file, clicking the three-dot more button, and selecting edit in the dropdown menu like so: Once the text editor is open, paste this python example inside of the file: def message_decode_tutorial ( message , c ): holder = \"\" for letter in range ( 0 , len ( message )): if ( letter + 1 ) % c == 0 : holder = holder + message [ letter ] return holder message = 'gT baZu lWp Kjv uXyeS nViU fdlH gJr KaIc tBpl Sy \\ Jox MtUl Qbm kGTp UdHe hdLJf Nu IcPRu XhBtDjf TsmPf \\ o DoKfw xP qyTcJ tUpYrv Pk ArBCf Wrtp JfRcX JqPdKLC' cypher = message_decode_tutorial ( message , 10 ) with open ( '/home/NETID/decoded_example.txt' , 'w+' ) as output : print ( cypher , file = output ) This python function takes a given message and parses through it against the parameters of a cypher, which in our case writes every 10th letter. Make sure to replace the placeholder 'NETID' in the second to last line with your personal NetID. This will allow your output file to go into your homespace. From here, navigate back to your project directory and select the '+ New File' button, this time naming it batch_tutorial.sh . Using Slurm options to define resource requests for this job, paste the following code inside of this file like you did the previous file: #!/bin/bash #SBATCH --job-name=message_decode_tutorial #SBATCH --time=1:00 #SBATCH --mem-per-cpu=2MB #SBATCH --mail-type=ALL module load miniconda source activate tutorial_env python message_decode_tutorial.py Because the partition isn't specified for this job, it will run on the cluster's default partition. From there, you can go back to the shell window, navigate to your project directory and run the sbatch command to begin your batch job like so: # Navigate to the project directory cd project # Use Slurm to start a batch job sbatch batch_tutorial.sh Once you receive an email saying the job is complete, navigate to your home-space through the shell window on Open OnDemand. Within this directory you will find a file called decoded_example.txt . To quickly see the file contents, use the cat command to print the file's contents on the standard output, revealing the decoded message like so: # Navigate to your homespace (replacing NETID with your netID) cd /home/NETID # Print out the decoded message cat decoded_example.txt","title":"Part 2: Batch Jobs"},{"location":"resources/intro_to_hpc_tutorial/#part-3-interactive-apps-on-ood","text":"Now that you have completed both an interactive and batch job, try using Jupyter Notebooks on Open OnDemand for your work. This can be done in the shell window like so: # Purge any loaded modules module purge # Build your environment dropdown tab on OOD ycrc_conda_env.sh update Now that this is completed, return to the Open OnDemand homepage and select the Interactive Apps dropdown tab in the top navigation bar. From there you can select Jupyter and load the job submission request form. To select your resources, make sure to consult our Slurm documentation as well as the specific cluster's partition information to ensure you're selecting the appropriate resources for your job's needs. Once the session is submitted and running, connect to the notebook and navigate to your working directory. From there you can either select the Upload button to upload an existing Jupyter notebook file or select the New button to create a new notebook. To help with this, make sure to look over the YCRC Jupyter Notebook information as well as Jupyter's User Interface page .","title":"Part 3: Interactive Apps on OOD"},{"location":"resources/national-hpcs/","text":"National HPCs Beyond Yale\u2019s on campus clusters, there are a number of ways for researchers to obtain compute resources (both cycles and storage) at national facilities. Yale researchers may use the Data Management Planning Tool ( DMPtool ) to create, review, and share data management plans that are in accordance with institutional and funder requirements. ACCESS (formerly XSEDE) Quarterly | Application & Info \"Explore Allocations\" are readily available on ACCESS resources for benchmarking and planning runs. For even lower commitment allocations (e.g. to just explore the resource), YCRC staff members have \"Campus Champions\" allocations on all ACCESS resources that can be shared upon request. Contact us for access. ACCESS resources include the following. Up to date information is available at access-ci.org : Stampede2: traditional compute and Phis Jetstream: Science Gateways Bridges2: traditional compute and GPUs Comet: traditional compute and GPUs XStream: GPU cluster Department of Energy NERSC, Argonne Leadership Computing Facility (ALCF), Oak Ridge Leadership Computing Facility (OLCF) INCITE Due in June | Application & Info ALCC Due in June | Application & Info ANL Director\u2019s Discretionary Rolling submission | Application & Info 3-6 month duration. Expectation is that you are using it to gather data for ALCC or INCITE proposal OLCF Director\u2019s Discretionary Rolling submission | Application & Info NCSA: Blue Waters PRAC Due in November | Application & Info Blue Water\u2019s Innovation Allocations Rolling submission | Application & Info Open Science Grid (OSG) Rolling Submission | Application & Info The OSG facilitates access to distributed high throughput computing for research in the US. The resources accessible through the OSG are contributed by the community, organized by the OSG, and governed by the OSG consortium.","title":"National HPCs"},{"location":"resources/national-hpcs/#national-hpcs","text":"Beyond Yale\u2019s on campus clusters, there are a number of ways for researchers to obtain compute resources (both cycles and storage) at national facilities. Yale researchers may use the Data Management Planning Tool ( DMPtool ) to create, review, and share data management plans that are in accordance with institutional and funder requirements.","title":"National HPCs"},{"location":"resources/national-hpcs/#access-formerly-xsede","text":"Quarterly | Application & Info \"Explore Allocations\" are readily available on ACCESS resources for benchmarking and planning runs. For even lower commitment allocations (e.g. to just explore the resource), YCRC staff members have \"Campus Champions\" allocations on all ACCESS resources that can be shared upon request. Contact us for access. ACCESS resources include the following. Up to date information is available at access-ci.org : Stampede2: traditional compute and Phis Jetstream: Science Gateways Bridges2: traditional compute and GPUs Comet: traditional compute and GPUs XStream: GPU cluster","title":"ACCESS (formerly XSEDE)"},{"location":"resources/national-hpcs/#department-of-energy","text":"NERSC, Argonne Leadership Computing Facility (ALCF), Oak Ridge Leadership Computing Facility (OLCF)","title":"Department of Energy"},{"location":"resources/national-hpcs/#incite","text":"Due in June | Application & Info","title":"INCITE"},{"location":"resources/national-hpcs/#alcc","text":"Due in June | Application & Info","title":"ALCC"},{"location":"resources/national-hpcs/#anl-directors-discretionary","text":"Rolling submission | Application & Info 3-6 month duration. Expectation is that you are using it to gather data for ALCC or INCITE proposal","title":"ANL Director\u2019s Discretionary"},{"location":"resources/national-hpcs/#olcf-directors-discretionary","text":"Rolling submission | Application & Info","title":"OLCF Director\u2019s Discretionary"},{"location":"resources/national-hpcs/#ncsa-blue-waters","text":"","title":"NCSA: Blue Waters"},{"location":"resources/national-hpcs/#prac","text":"Due in November | Application & Info","title":"PRAC"},{"location":"resources/national-hpcs/#blue-waters-innovation-allocations","text":"Rolling submission | Application & Info","title":"Blue Water\u2019s Innovation Allocations"},{"location":"resources/national-hpcs/#open-science-grid-osg","text":"Rolling Submission | Application & Info The OSG facilitates access to distributed high throughput computing for research in the US. The resources accessible through the OSG are contributed by the community, organized by the OSG, and governed by the OSG consortium.","title":"Open Science Grid (OSG)"},{"location":"resources/online-tutorials/","text":"Online Tutorials Linux/Unix and Command Line Introduction to Linux YCRC Workshop: Practical Introduction to Linux , ( Video ) *Recommended Most Commonly Used Commands - RedHat.com Command Line for Beginners - Ubuntu.com Note: You can learn more about most commands you come across by typing \"man [command]\" into the terminal. awk (text extraction/parsing) awk is a tool for parsing text and extracting certain section. It is particularly useful for extracting, and even reordering, columns out of tables in text files. Introduction to awk and examples of common usage In-depth guide to awk and more advanced usage grep Grep is tool for searching command line output or files for a certain string (phrase) or regular expression. Introduction to grep and examples of common usage In-depth guide to grep and more advanced usage sed sed (Stream EDitor) is a tool for making substitutions in a text file. For example, it can be useful for cleaning (e.g. replace NAN with 0) or reformatting data files. The syntax sed uses for substitutions is common in Linux (for example, the same syntax is used in the VIM text editor). Introduction to sed and examples of common usage In-depth guide to sed and more advanced usage SSH (connecting to the clusters or other remote linux servers) Connecting to the Yale clusters Transfer files to/from the cluster Advanced SSH configuration In-depth guide to ssh Bashrc and Bash Profiles What is the .bashrc and .bash_profile ? [Set aliases for commonly used commands] [Environment variables] tar or tar.gz archive .tar or t.ar.gz are common archive (compressed file) formats. Software and data will frequently be distributed in one of these archive formats. The most common command for opening and extracting the contents of a tar archive is tar xvf archive.tar and, for a tar.gz archive, tar xvzf archive.tar.gz . See the following link(s) for more details on creating tar files and more advanced extraction options. Creating and extracting from a tar file Install Windows and Linux on the same computer Windows for Linux It is possible to run Linux terminals and applications from within a Windows installation using the \"Windows Subsystem for Linux\". Windows Subsystem for Linux Dual Boot \"Dual Boot\" means you have two separate installations for Windows and Linux, respectively, that switch between by restarting your computer. Dual Boot Linux Mint and Windows Dual Boot Ubuntu and Windows Python Intro to Python Fantastic resource for anyone interested in Python LinkedIn Learning: Learning Python (Yale only) Parallel Programming with Python Quick Tutorial: Python Multiprocessing Parallel Programming with Python YCRC Workshop: Parallel Python mpi4py YCRC Workshop: mpi4py mpi4py example scripts Documentation for mpi4py R Intro to R Brief intro to R Thorough intro to R Another thorough intro to R foreach Using the foreach package - Steve Weston foreach + dompi Introduction to doMPI Matlab Mathworks Online Classses Singularity / Apptainer Documentation Singularity has officially been renamed Apptainer, but we expect no changes to its functionality. Apptainer Docs Page Singularity Google Groups Tutorials YCRC Workshop: Containers NIH tutorial on Singularity NVIDIA tutorial for using GPUs with Singularity","title":"Online Tutorials"},{"location":"resources/online-tutorials/#online-tutorials","text":"","title":"Online Tutorials"},{"location":"resources/online-tutorials/#linuxunix-and-command-line","text":"","title":"Linux/Unix and Command Line"},{"location":"resources/online-tutorials/#introduction-to-linux","text":"YCRC Workshop: Practical Introduction to Linux , ( Video ) *Recommended Most Commonly Used Commands - RedHat.com Command Line for Beginners - Ubuntu.com Note: You can learn more about most commands you come across by typing \"man [command]\" into the terminal.","title":"Introduction to Linux"},{"location":"resources/online-tutorials/#awk-text-extractionparsing","text":"awk is a tool for parsing text and extracting certain section. It is particularly useful for extracting, and even reordering, columns out of tables in text files. Introduction to awk and examples of common usage In-depth guide to awk and more advanced usage","title":"awk (text extraction/parsing)"},{"location":"resources/online-tutorials/#grep","text":"Grep is tool for searching command line output or files for a certain string (phrase) or regular expression. Introduction to grep and examples of common usage In-depth guide to grep and more advanced usage","title":"grep"},{"location":"resources/online-tutorials/#sed","text":"sed (Stream EDitor) is a tool for making substitutions in a text file. For example, it can be useful for cleaning (e.g. replace NAN with 0) or reformatting data files. The syntax sed uses for substitutions is common in Linux (for example, the same syntax is used in the VIM text editor). Introduction to sed and examples of common usage In-depth guide to sed and more advanced usage","title":"sed"},{"location":"resources/online-tutorials/#ssh-connecting-to-the-clusters-or-other-remote-linux-servers","text":"Connecting to the Yale clusters Transfer files to/from the cluster Advanced SSH configuration In-depth guide to ssh","title":"SSH (connecting to the clusters or other remote linux servers)"},{"location":"resources/online-tutorials/#bashrc-and-bash-profiles","text":"What is the .bashrc and .bash_profile ? [Set aliases for commonly used commands] [Environment variables]","title":"Bashrc and Bash Profiles"},{"location":"resources/online-tutorials/#tar-or-targz-archive","text":".tar or t.ar.gz are common archive (compressed file) formats. Software and data will frequently be distributed in one of these archive formats. The most common command for opening and extracting the contents of a tar archive is tar xvf archive.tar and, for a tar.gz archive, tar xvzf archive.tar.gz . See the following link(s) for more details on creating tar files and more advanced extraction options. Creating and extracting from a tar file","title":"tar or tar.gz archive"},{"location":"resources/online-tutorials/#install-windows-and-linux-on-the-same-computer","text":"","title":"Install Windows and Linux on the same computer"},{"location":"resources/online-tutorials/#windows-for-linux","text":"It is possible to run Linux terminals and applications from within a Windows installation using the \"Windows Subsystem for Linux\". Windows Subsystem for Linux","title":"Windows for Linux"},{"location":"resources/online-tutorials/#dual-boot","text":"\"Dual Boot\" means you have two separate installations for Windows and Linux, respectively, that switch between by restarting your computer. Dual Boot Linux Mint and Windows Dual Boot Ubuntu and Windows","title":"Dual Boot"},{"location":"resources/online-tutorials/#python","text":"","title":"Python"},{"location":"resources/online-tutorials/#intro-to-python","text":"Fantastic resource for anyone interested in Python LinkedIn Learning: Learning Python (Yale only)","title":"Intro to Python"},{"location":"resources/online-tutorials/#parallel-programming-with-python","text":"Quick Tutorial: Python Multiprocessing Parallel Programming with Python YCRC Workshop: Parallel Python","title":"Parallel Programming with Python"},{"location":"resources/online-tutorials/#mpi4py","text":"YCRC Workshop: mpi4py mpi4py example scripts Documentation for mpi4py","title":"mpi4py"},{"location":"resources/online-tutorials/#r","text":"","title":"R"},{"location":"resources/online-tutorials/#intro-to-r","text":"Brief intro to R Thorough intro to R Another thorough intro to R","title":"Intro to R"},{"location":"resources/online-tutorials/#foreach","text":"Using the foreach package - Steve Weston","title":"foreach"},{"location":"resources/online-tutorials/#foreach-dompi","text":"Introduction to doMPI","title":"foreach + dompi"},{"location":"resources/online-tutorials/#matlab","text":"Mathworks Online Classses","title":"Matlab"},{"location":"resources/online-tutorials/#singularity-apptainer","text":"","title":"Singularity / Apptainer"},{"location":"resources/online-tutorials/#documentation","text":"Singularity has officially been renamed Apptainer, but we expect no changes to its functionality. Apptainer Docs Page Singularity Google Groups","title":"Documentation"},{"location":"resources/online-tutorials/#tutorials","text":"YCRC Workshop: Containers NIH tutorial on Singularity NVIDIA tutorial for using GPUs with Singularity","title":"Tutorials"},{"location":"resources/sw_carpentry/","text":"Software Carpentry To help researchers learn the skills they need, they can utilize Software Carpentry 's in-house training as well as their community-led lesson development to help them get started. These in-house lessons are offered in both English and Spanish and go over Unix and Git basics as well as working with Python and R. To learn more about the community-based lessons available to users, see the Carpentries Lab page for more information.","title":"Software Carpentry"},{"location":"resources/sw_carpentry/#software-carpentry","text":"To help researchers learn the skills they need, they can utilize Software Carpentry 's in-house training as well as their community-led lesson development to help them get started. These in-house lessons are offered in both English and Spanish and go over Unix and Git basics as well as working with Python and R. To learn more about the community-based lessons available to users, see the Carpentries Lab page for more information.","title":"Software Carpentry"},{"location":"resources/yale_library/","text":"Yale Library The Yale Library has many resources available to cluster users. For more information about the Yale Library, see the Ask Yale Library page here . O'Reilly Safari eBooks The Yale Library offers access to the O'Reilly Safari eBooks collection through your Yale credentials. This can be accessed by this Safari eBooks access page making sure to sign in with your Yale email. Once logged on, users can access a variety of digital books and courses.","title":"Yale Library"},{"location":"resources/yale_library/#yale-library","text":"The Yale Library has many resources available to cluster users. For more information about the Yale Library, see the Ask Yale Library page here .","title":"Yale Library"},{"location":"resources/yale_library/#oreilly-safari-ebooks","text":"The Yale Library offers access to the O'Reilly Safari eBooks collection through your Yale credentials. This can be accessed by this Safari eBooks access page making sure to sign in with your Yale email. Once logged on, users can access a variety of digital books and courses.","title":"O'Reilly Safari eBooks"}]} \ No newline at end of file diff --git a/sitemap.xml.gz b/sitemap.xml.gz index 2075200c26721adb4133725606de87b1f0eee955..74f665eb54c76a7146e4df11c74b27fa70bb78b5 100644 GIT binary patch delta 14 VcmZ3+v5bRNzMF%?XCtd33jiEQ1494+ delta 14 VcmZ3+v5bRNzMF%CXCtd33jiAk0|Nj6