Skip to content

Latest commit

 

History

History
13 lines (10 loc) · 2.66 KB

paper_results.md

File metadata and controls

13 lines (10 loc) · 2.66 KB

Restaurants dataset

We based our experiments on a total of 1828 restaurants from Yelp.com across 4 cities. The list of restaurants' ids used in our study can be found at paper_data/restaurants_list.txt. For instance, for NX8VYnWFQ2ZY-H0HwvfPTw, the corresponding restaurant is https://www.yelp.com/biz/NX8VYnWFQ2ZY-H0HwvfPTw. For each restaurant, the top 20 reviews and top 20 popular dishes were used for our experiments.

We collected 100 single-turn questions via Prolific. The results of the SUQL system can be found at paper_data/restaurants_single_turn_SUQL.csv. Here is a breakdown of what the columns mean: Unique ID (a unique row identifier), Utterance (the collected user question), Predicted SUQL (the predicted SUQL with gpt-3.5-turbo-0613), Agent response (the SUQL system's response), 1st entity, 2nd entity, 3rd entity (the up-to-three returned entities), Entity Annotation,Whether Wrong Parse (whether this is a wrong SUQL parse), Prolific ID (prolific ID associated with the submission), Structural Unstructural annotation (whether the question requires only structured data or a combination, used for Table 4 in the paper). The false positives from the SUQL system (based on annotation) can be found in paper_data/restaurants_single_turn_SUQL_fp.txt

We also collected 96 turns across 20 conversations. The collected dialog data and its annotation can be accessed here. Here is a breakdown of what the columns mean:

  • Unique ID: Unique row identifier. Each identifier is of format id,num, where the same id denotes the same conversation, and num denotes the turn number (0-indexed) in this conversation. We excluded turns that do not involve restaurant-related queries (e.g. chit-chatting, asking what the system can do, etc.).
  • Utterance: the collected user question.
  • Predicted SUQL: the predicted SUQL with gpt-3.5-turbo-0613). Green denotes a correct parse, and red denotes a wrong parse.
  • Agent response: the SUQL system's response.
  • 1st/2nd/3rd entity: the entities returned from the SUQL system. Each green cell denotes a true positive. Each red cell denotes a false positive. Alternatively, if a row does not contain returned entities and the row indeed involves searching for restaurants, a drop-down box denoting true or false negative is present. Within each conversation, the same entity is only counted once (the same entity appearing twice would not be colored).
  • Structural Unstructural annotation: whether the question requires only structured data or a combination, used for Table 4 in the paper.