Skip to content
/ shifts Public
forked from Shifts-Project/shifts

This repository contains data readers and examples for the three tracks of the Shifts Dataset and the Shifts Challenge.

License

Notifications You must be signed in to change notification settings

lkra/shifts

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

img

Shifts Challege

This repository contains data readers and examples for the three tracks of the Shifts Dataset and the Shifts Challenge.

The Shifts Dataset contains curated and labelled examples of real, 'in-the-wild' distributional shift across three large-scale tasks. Specifically, it contains a tabular weather prediction task, machine translation, and Vehicle Motion Prediction. Dataset shift is ubiquitous in all of these tasks and modalities. The dataset, assessment metrics and benchmark results are detailed in our associated paper: Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks

If you use the Shifts Dataset in your work, please cite our paper using the following Bibtex:

@article{shifts2021,
  author    = {Malinin, Andrey and Band, Neil and Ganshin, Alexander, and Chesnokov, German and Gal, Yarin, and Gales, Mark J. F. and Noskov, Alexey and Ploskonosov, Andrey and Prokhorenkova, Liudmila and Provilkov, Ivan and Raina, Vatsal and Raina, Vyas and Roginskiy, Denis and Shmatova, Mariya and Tigar, Panos and Yangel, Boris},
  title     = {Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks},
  journal   =  {arXiv preprint arXiv:2107.07455},
  year      = {2021},
}

If you have any questions about the Shifts Dataset, the paper or the benchmarks, please contact am969@yandex-team.ru .

Dataset Download And Licenses

License

The Shifts dataset is released under a mixed license.

Weather Prediction

The Shifts Weather Prediction Dataset is released under CC BY NC SA 4.0 license. This dataset was constructed by combining features from publicly available weather prediction services and models. Specifically, we combined data from NOAA/NWS servers, data generated by WRF model from NCAR/UCAR, and data from Meteorological Service of Canada. Ground station readings were taken from [NOAA] (https://www.weather.gov/disclaimer). The data was cleaned and features standardized.

Machine Translation

The Shifts Machine Translation Dataset is released under a mixed license.

GlobalVoices evaluation data is released under CC BY NC SA 4.0.

The english source data was taken from GlobalVoices (originally licenced under CC BY 3.0) and target Russian translations provided by Yandex in-house professional translators.

The source-side text for the Reddit development and evaluation datasets exist under terms of the Reddit API. The target side Russian sentences were obtained by Yandex via in-house professional translators and are released under CC BY NC SA 4.0. We highlight that the development set source sentences are the same ones as used in the MTNT dataset.

Motion Prediction

Shifts SDC Motion Prediction Dataset is released under CC BY NC SA 4.0 license.

Download links

As the Shifts Challenge is currently underway, we are only releasing the full training and development sets of the canonical partition for all tasks of the Shift Dataset, as detailed in our paper. Evaluation data without ground-truth labels or metadata will be released on October 17th 2021. The evaluation data labels and ground-truth predictions, as well as the full Shifts Dataset, will become availabe on November 1st 2021, after the Shifts Challenge concludes.

By downloading the Shifts Dataset, you automatically agree to the licenses described above.

Weather Prediction

Canonical parition of the training and development data can be downloaded here. Baseline models can be downloaded here.

Machine Translation

The training data for this task is the WMT'20 En-Ru dataset can be downloaded here and the development data can be downloaded here. All data is automatically downloaded via the scripts provided here. Baseline models can be downloaded here.

Motion Prediction

Canonical parition of the training and development data can be downloaded here. Baseline models can be downloaded here.

About

This repository contains data readers and examples for the three tracks of the Shifts Dataset and the Shifts Challenge.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 75.8%
  • Python 23.5%
  • Shell 0.7%