Skip to content

Latest commit

 

History

History
90 lines (65 loc) · 5.26 KB

README.md

File metadata and controls

90 lines (65 loc) · 5.26 KB

Accommodation Search Dialog Corpus (in Japanese)

Creative Commons Attribution 4.0 International License Python Versions CI Typos

Main part: data/main

The main part of this corpus consists of 210 Japanese dialogs between two people acting as a customer and an operator in a fictitious accommodation consultation service by using Slack. In a dialog, the customer informed the operator of their situation and needs. Then based on the information, the operator conducted a search to meet the customer's request. The dialog was finished once the operator judged that the requirements were specific enough to narrow appropriate accommodations. Dialogs are in two formats.

  • Text: data/main/dialog/text/*.tsv
  • JSON: data/main/dialog/json/*.json

Please read documents for more details.

Annotations

Name Doc Data
SCUD Doc data/main/scud_example/main.Example.jsonl, data/main/scud
Dialog act Doc data/main/dialog_act
Request spans Doc data/main/request_span

The number of SCUDs is about 3,500.

Name Utterance SCUD DA RS
Agent さようでございますか。
それでは、駐車場を無料でご利用できるホテルをお探しします。
立地ですが、観光地をまわりやすい場所はいかがでしょうか?
User はい、観光地をまわりやすい場所にあるといいですね。 ホテルが観光地をまわりやすい場所にあると良い。 はい
ただ1番の目的は出雲大社なので、そこまでアクセスがよければ助かります。 【customer】の1番の目的が出雲大社だ。
出雲大社までアクセスが良いホテルだと良い。
要求 出雲大社=>立地
アクセスがよければ=>立地

Supplemental SCUD part: data/supplemental/scud: 57,447 examples

Files in data/supplemental/scud are Supplemental fictitious dialogs with SCUD annotations. Please read the documents for more details.

  • Most dialogs consist of a single pair of an agent utterance and a user utterance.
  • Dialogs are stored in files in data/supplemental/utterances : 51,390 dialogs

Supplemental correctness-labeled SCUD part: data/supplemental/correctness_labeled_scud: 8,115 examples

Files in data/supplemental/correctness_labeled_scud are Supplemental fictitious dialogs with SCUD and its correctness annotations. If the value correct of an example is false, the example has incorrect SCUDs.

Vanilla part: data/vanilla: 74,799 dialogs

Files in data/vanilla are fictitious dialogs or queries made by crowd workers with no SCUD annotations. Please read the documents for more details.

Utterance 1 Utterance 2
あなたが、高級ホテルに泊まるとしたらどのようなホテルに泊まりたいですか? 食事と景色が美しく、バラ風呂などの工夫があるホテル
あなたが、1週間の国内旅行ができることになったら、どのような旅行をしたいですか? ゆっくり読書をたのしむ旅行

References

Dialog collection and SCUDs

  1. Yuta Hayashibe. Self-Contained Utterance Description Corpus for Japanese Dialog. Proc of LREC, pp.1249-1255. (LREC 2022) [PDF]
  2. 林部祐太. 要約付き宿検索対話コーパス. 言語処理学会第27回年次大会論文集,pp.340-344. 2021. (NLP 2021) [PDF]

Dialog acts and request spans

  1. Hongjie Shi. A Span Extraction Approach for Dialog State Tracking: A Case Study in Hotel Booking Application. 言語処理学会第27回年次大会論文集,pp.1593-1598. 2021. (NLP 2021) [PDF]
  2. Hongjie Shi. A Sequence-to-sequence Approach for Numerical Slot-filling Dialog Systems. Proc of SIGdial, pp.272-277. 2020. (SIGdial 2020) [PDF]

License