This is a template repository to prepare your submission for phase 1 of the Predicting Fertility Data Challenge (PreFer) through the Next platform. The challenge is to predict whether an individual will have a child within a three year period (2021-2023), based on survey data from previous years (2007-2020). Data come from the LISS panel. For more information, on the data challenge, please visit the website or read this paper.
ℹ️ Check out the Wiki for challenge scope, leaderboards, and frequently asked questions.
- Make a copy of this template repository, by forking and cloning as explained here. Use your own copy of the repository to prepare your method for submission as explained here.
- Make sure to allow Github Actions on your own repository: Go to the “Actions” tab and click “I understand my workflows, go ahead and enable them.”
- If you have not already done so, download the training data and codebooks via the "Download Data" task on the Next platform.❗️Important: you are not allowed to share these datasets and you may not upload them to your Github repository!
ℹ️ Click here for a detailed explanation on the datasets that you have downloaded. Click here for an explanation on how to use the codebooks.
To participate in the challenge you need to submit a method using this repository.
-
Choose your programming language: the default set-up is Python, if you would like to use R, go to
settings.json
and change{"dockerfile": "python.Dockerfile"}
into{"dockerfile": "r.Dockerfile"}
. Read here how to update files in your forked repository. ℹ️ For Python this repo assumes that your method uses the Anaconda Python distribution. -
Choose the main script to work with: go to
submission.py
for Python orsubmission.R
for R. -
Preprocess the data: any steps to clean or preprocess the data need to be added to the
clean_df
function in thesubmission.py
/submission.R
script with documentation. Note: The functionclean_df
will also be applied to the holdout data when you submit your model. At this point, the codebooks can be useful to make sense of the data. -
Train, tune, and save your model: any steps to train your model need to be added to the
training.py
/training.R
script with documentation (e.g., code for the model, number of folds, set seed). The only function in this script istrain_save_model
in which you can add the steps needed to run the model. The output of this script is your saved model, e.g.model.joblib
for Python ormodel.rds
for R. Make sure that your model is saved in the same folder assubmission.py
/submission.R
under the namemodel.joblib
for Python ormodel.rds
for R. You can save the model in another format as well. -
Test your model on fake data: you can test your
clean_df
function and your model (stored in:model.joblib
/model.rds
) on the fake data (PreFer_fake_data.csv
) with thepredict_outcomes
function. Thepredict_outcomes
function insubmission.py
/submission.R
will be run on the holdout data to generate your challenge submission result on the leaderboard. Make sure that the outputs of your model are predicted classes (i.e. 0s and 1s) rather than, for example, probabilities. If you saved the model in another format (not 'joblib' for Python or 'rds' for R), update the way of loading the model. Also, make sure to add or edit dependencies when required as described here. If your method does not run on the "fake data", it will not run on the holdout data. If you passed the test (i.e.predict_outcomes
led to predictions rather than errors), you can start submitting your method.
ℹ️ Check out this website for guides, notebooks, and blogs to guide you through this process.
Submit your method via the "Submit Method" task on the Next platform by providing a link to the repository with your method (GitHub commit URL). Follow the instructions below:
- Make sure that you describe your model in the
description.md
file in your GitHub repository and commit changes (i.e. save changes locally) - Push the commit (i.e. upload changed version to your online repository). ❗️Important: make sure that you only push the relevant files and make sure that you do not upload any of the datasets.
- In GitHub make sure that the checks pass:
ℹ️ If the check fails go to FAQ. You might need to add dependencies as described here.
-
On the main page of your repository, above the file list, click "Commits" to view a list of commits. Do NOT click "N commits ahead of". See example below:
-
Go to the commit that you want to submit and right click on "view commit details", then click "Copy Link Address", see example below:
- Add a submission on the Next platform by providing the URL to your GitHub commit (copied at step 5), this commit will serve as your submission to the challenge.
ℹ️ Leaderboards are generated at fixed time points, check out important dates for leaderboard submission deadlines. Check out the Wiki for more info on the leaderboards.
This project is licensed under the terms of the MIT license.
The code in this repository is developed by Eyra as part of the Rank program funded by ODISSEI and the NWO VIDI grant awarded to Gert Stulp. The LISS panel data is provided by Centerdata.