Skip to content

Chinese extractive question answering baseline using Formosa Grand Challenge 2020 dataset and BERT.

Notifications You must be signed in to change notification settings

ylhsieh/formosa-grand-challenge-2020-baseline

Repository files navigation

Baseline for Formosa Grand Challenge 2020

  1. Requirements
  2. Install
    • Clone this repository
    • Download FGC dataset unzip and place under a sub-directory named json
    • Download DRCD corpus and place under json
  3. Preprocess dataset
    • Run python FGC_merge_to_DRCD_json.py to merge FGC training data into DRCD
    • Run python FGC_mocks_to_DRCD_json.py to create development set data using FGC mock tests
    • Run python FGC_final_to_DRCD.py to convert official test set data to DRCD format
  4. Run run_fgc_baseline.ipynb (can be run in Google Colab)
    • On a single Titan X GPU with 12G of memory, we can use the hyperparameters listed in here
    • Multi-GPU support is in beta
  5. Test set performance: correctly answer 15 out of 50 questions

About

Chinese extractive question answering baseline using Formosa Grand Challenge 2020 dataset and BERT.

Topics

Resources

Stars

Watchers

Forks