Overview

Working as a team, or individually, participants will extract data from, validate, analyze, visualize, and produced insights from a large dataset.

You will complete these distinct aspects of the challenge in stages. Each stage is worth points towards earning a total score. The challenge is otherwise largely open ended.

Dataset

The data for this challenge comes from the National Oceanic and Atmospheric Administration (NOAA) daily global surface summaries (GSOD) from over 9000 weather stations across the planet, some going all the way back to 1929.

For a full description of the dataset see the Dataset wiki page.

Challenge

There are a veritable plethora of avenues to explore with this dataset. Studies of trends in weather patterns across both geographic regions and time come to mind; climate change research; etc. Purdue University has numerous research groups who conduct novel climate research, no doubt making use of similar historical data.

For the purposes of today's challenge, we'll set our sights a bit lower. The goal of today is to demonstrate participants' prowess in both computational tasks, as well as the ability to explore a dataset and tell a story about what it holds and what it offers.

Participants will extract data from weather stations near Indiana going back as far as possible. See the Challenge wiki page for details on each of the five stages of the competition.

Submission

Official submissions for consideration will be provided by opening a Pull Request to this repository. For details on how this works and what is expected, see the Submission wiki page.

Scoring

The winner of today's competition will be decided by a tallying of points awarded for completion (1), mastery (2), and excellence (3) in each of five stages. Further, additional points will be awarded for the observable presence of team work, presentation skills, and professionalism. For full details on how the scoring will work see the Scoring wiki page.

Resources

This challenge includes access to Purdue's Brown super computing cluster. Each team/participant will be given a trial account with a single node reservation. This not only provides the much needed resources necessary for working with datasets such as this but also puts all participants on a level playing field. For full details on the resources offered for this challenge and how to use them see the Resources wiki page.

Technology

There is no restriction on what tools, technologies, or computing languages can be used for this competition. The expectation is that participants will deploy a combination of command line tools and custom code written in their preferred language to fulfill each aspect of the challenge. For full details on what is expected see the Technology wiki page.

Presentation

At the end of the competition, each team will give a 10 minute presentation to the judging personnel covering their work. For full details on what is expected see the Presentation wiki page.

Previous: Home | Next: Dataset

AITP Computing Challenge Day 2019	Data Science Challenge	Research Computing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly