ConEL: Conversational Entity Linking Datasets

This repository provides the resources related to entity linking annotations in conversational settings.
These resources are created on the existing datasets:

These resources are developed within the following paper:

Hideaki Joko, Faegheh Hasibi, Krisztian Balog, and Arjen P. de Vries. “Conversational Entity Linking: Problem Definition and Datasets”.

The repository is structured as follows:

./data: MTurk entity annotations
./mturk_interfaces: MTurk interface used to collect the entity annotations

Data

MTurk entity annotation data is stored in ./data.

./data/ConEL_Concept_Named_Entity/: Stratified samples
- ./data/ConEL_Concept_Named_Entity/ConEL_CNE.json: Entity annotations from 25 dialogues from each dataset (i.e., MWOZ, QuAC, WoW, and TREC-CAST 2020).
./data/ConEL_Personal_Entity/: WoW with personal entities
- ./data/ConEL_Personal_Entity/ConEL_PE.json: 25 WoW dialogues which contains personal entities in each dialogue.
run folders contain EL tools' results

Statistics

	Stratified samples (`ConEL_CNE.json`)	WoW with personal entities (`ConEL_PE.json`)
# dialogues	100	25
# user utterances	708	113

Data Format

This section explains ground truth files data format (ConEL_CNE.json and ConEL_PE.json)
Each element in a list has a dict structure as follows:

{
    "dialogue_id": "10060",
    "dataset_name": "wow", # or "quac", "mwoz", "cast20raw", "cast20manu"
    "turns": [
        {
            "speaker": "USER", # or "SYSTEM"
            "utterance": "Blue is my favorite color, by far. What's yours?",
            "turn_number": 0, 
            "el_annotations": [ # Ground truth annotations
                {
                    "mention": "Blue",
                    "entity": "Blue",
                    "entity_type": "concept", # or "named_entity"
                }
            ],
            "personal_entity_annotations": [ # Personal entity annotations
                {
                    "personal_entity_mention": "my favorite color",
                    "explicit_entity_mention": "Blue",
                    "turn_number_of_explicit_entity_mention": 0,
                    "entity": "Blue"
                }
            ]
        },
    ]
}

dialogue_id: dialogue id provided by each original dataset (i.e., MWOZ, QuAC, WoW, and TREC-CAsT 2020).
dataset_name: The name of the dataset in which the conversations were used (cast20raw and cast20manu represent )
turns: each element contains an user or system turns
- speaker: USER or SYSTEM
- utterance: utterance acquired from the dataset. (Note that for TREC-CAST 2020 system turns, only manual_canonical_result_id are shown)
- el_annotations: annotations with MTurk workers
- personal_entity_annotations: Personal entity annotations. Note that only ConEL_PE.json has this annotations.

MTurk Interfaces

MTurk interface used to collect the entity annotations.
Interfaces are Stored in ./mturk_interfaces directory.

Conversational Dataset List

Conversational Dataset List: A comprehensive list of around 130 conversational datasets released by different research communities

Cite

@inproceedings{Joko:2021:CEL,
 author =    {Joko, Hideaki and Hasibi, Faegheh and Balog, Krisztian and de Vries, Arjen P.},
 title =     {Conversational Entity Linking: Problem Definition and Datasets},
 booktitle = {Proceedings of the 44rd International ACM SIGIR Conference on Research and Development in Information Retrieval},
 series =    {SIGIR '21},
 year =      {2021},
 publisher = {ACM}
}

Contact

If you have any questions, please contact Hideaki Joko at hideaki.joko@ru.nl

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
data		data
mturk_interfaces		mturk_interfaces
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ConEL: Conversational Entity Linking Datasets

Data

Statistics

Data Format

MTurk Interfaces

Conversational Dataset List

Cite

Contact

About

Releases

Packages

Contributors 2

Languages

License

informagi/conversational-entity-linking

Folders and files

Latest commit

History

Repository files navigation

ConEL: Conversational Entity Linking Datasets

Data

Statistics

Data Format

MTurk Interfaces

Conversational Dataset List

Cite

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages