AmbigNLG: Addressing Task Ambiguity in Instruction for NLG

This repository contains AmbigSNI_NLG dataset, developed for the study 'AmbigNLG: Addressing Task Ambiguity in Instruction for NLG.'

This dataset includes the ambiguity categories and their corresponding additional instructions to mitigate each ambiguity. It was constructed through an LLM-in-the-loop annotation process on the Super-natural instructions benchmark.

See full details in the paper: AmbigNLG: Addressing Task Ambiguity in Instruction for NLG.

🔧 AmbigNLG setup

Install dependencies, download the raw dataset of Super-natural instructions, and then build AmbigSNI_NLG dataset.

pip install -r requirements.txt
bash ./scripts/setup_data.sh
OUTPUT_PATH=XXXXX.jsonl; python ./scripts/setup.py --output_path $OUTPUT_PATH

🌲 Dataset Structure

data/ambigsni_nlg.jsonl file comprises 2,500 instances. Each instance is a dictionary with the following keys:

Key	Explanation
`id`	The unique identifier for each instance.
`ambiguity_categories`	List of assigned ambiguity categories (`planning`, `keywords`, `context`, `style`, `theme`, or `length`) assigned to each instance.
`additional_instructions`	The additional instruction for each assigned ambiguity category.
`split`	Dataset split usage (`demonstration` or `test`)

The data was constructed by LLM-in-the-loop annotation approach, where we manually curate and verify the dataset by guiding the LLM’s generation.

Please refer to the details in Section 4 of our paper.

Example Instance

{
  "id": "task1598-4e8b21aebcc54e61b232f9d14c43e09d",
  "ambiguity_categories": ["keywords"],
  "additional_instructions": {
    "planning": null,
    "keywords": "Include ['xname coffee shop', 'moderately priced', 'xnear', 'food'] in your response.",
    "context": null,
    "style": null,
    "theme": null,
    "length": null
  },
  "split": "test"
}

🗒 Notes

Recruit Co., Ltd. (hereinafter referred to as "Recruit") provides this dataset, which includes linguistic annotations (hereinafter referred to as the "Dataset"), with the goal of advancing research in natural language processing.
This Dataset is constructed by annotating the ambiguity categories and the corresponding additional instruction through LLM-in-the-loop annotation on Super-Natural Instructions benchmark. The annotations do not represent the views or evaluations of Recruit.
Please note that the Dataset may contain content that is inaccurate or inconsistent with actual facts.
This Dataset is subject to change or deletion without notice.

🖋 License and Attribution

When publishing a study using this dataset, please cite papers in References and describe the source of the data as follows.
- Example: To conduct this study, we used "AmbigNLG_SNI dataset" (https://github.com/megagonlabs/ambignlg) provided by Recruit Co., Ltd.
The license of this Dataset is in the same scope as Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

🚨 Prohibitions

Recruit discloses this Dataset for non-profit public use. It is strictly prohibited to use for profit purposes beyond the scope necessary for the presentation of analysis, research and results.
Even when publishing research results, users should not post data in the Dataset beyond the appropriate exemplary range in the publications and other materials set forth in the preceding paragraph. Users should not describe information obtained from the Dataset that violates public order and morals, promote or encourage criminal or other illegal acts.

📘 Reference

If you make use of AmbigNLG_SNI, please cite the following paper:

@inproceedings{niwa2024ambignlg,
      title={AmbigNLG: Addressing Task Ambiguity in Instruction for NLG}, 
      author={Ayana Niwa and Hayate Iso},
      month={nov},
      year={2024},
      booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
      publisher={Association for Computational Linguistics},
      url={https://arxiv.org/abs/2402.17717}, 
}

📩 Contact

If you have any inquiries and/or problems about a dataset or notice a mistake, please contact NLP Data Support Team nlp_data_support at r.recruit.co.jp.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
img		img
scripts		scripts
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AmbigNLG: Addressing Task Ambiguity in Instruction for NLG

🔧 AmbigNLG setup

🌲 Dataset Structure

🗒 Notes

🖋 License and Attribution

🚨 Prohibitions

📘 Reference

📩 Contact

About

Releases

Packages

Contributors 2

Languages

License

megagonlabs/ambignlg

Folders and files

Latest commit

History

Repository files navigation

AmbigNLG: Addressing Task Ambiguity in Instruction for NLG

🔧 AmbigNLG setup

🌲 Dataset Structure

🗒 Notes

🖋 License and Attribution

🚨 Prohibitions

📘 Reference

📩 Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages