Resources

ITaP Research Computing is providing exclusive access to high performance compute nodes for each team. This includes not just a powerful system with many cores and abundant memory, but also a high performance file system to work with the dataset. You can get the full details on all the information available regarding Purdue's community cluster program on their website.

Account

Each team should have been given a temporary username (something like "trainXX") and its associated password. Each member of the team will use the same credentials to access resources.

Note: Since you will be on the same account as your teammates be sure to isolate your work in a separate folder (i.e., clone the repository under your name so you don't clobber each other)!

Computing

You can get on the cluster in a few different ways. The original, classic thing to do is to login via a secure shell on the command line (i.e., ssh trainXX@brown.rcac.purdue.edu). Alternatively, there is a remote desktop interface available via your browser:

https://desktop.brown.rcac.purdue.edu

Both of these and more are accessible via the new Gateway interface. To access your node you'll need to request an interactive job using the queue we've setup for this competition (e.g., -q dscomp from the command line or by selecting dscomp from the dropdown on Gateway).

Data Storage

High performance computing clusters employ multiple data storage systems, each designed for a specific purpose. Each of these file systems are networked and available on each and every node in the cluster in exactly the same way (i.e., at the same path). With your account, you will have both a home directory and a scratch directory.

Your home directory (e.g., /home/trainXX/) is extremely limited in capacity (25 Gb) and will not be sufficient for this competition. Your scratch directory is on a different file system but available in the same way (e.g., /scratch/brown/trainXX/). There is a per user 200 TB quota, so plenty of space.

Your first order of business should be to clone the fork you made of this repository there and then copy over the datasets from the public space.

trainXX ~$ cd /scratch/brown/trainXX
trainXX ~$ mkdir data
trainXX ~$ scp /home/glentner/public/datasets/noaa/gsod.*.csv data/

Software

More details will be provided on the Technology wiki page. Most of the software tools, frameworks, and compilers you could want are already provided on the cluster, and available via the module system.

Previous: Scoring | Next: Technology

AITP Computing Challenge Day 2019	Data Science Challenge	Research Computing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resources

Account

Computing

Data Storage

Software

Contents

Clone this wiki locally