Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questions about the evaluation script #5

Closed
bozheng-hit opened this issue Oct 17, 2018 · 4 comments
Closed

questions about the evaluation script #5

bozheng-hit opened this issue Oct 17, 2018 · 4 comments

Comments

@bozheng-hit
Copy link

Hi Tao,
I evaluated the first example in gold_example.txt and pred_example.txt.
I want to know why the exact match result comes out to be 1.

The examples are:
gold: SELECT count() FROM singer|concert_singer
pred: select count(
) from stadium

The command I used is:
python evaluation.py --gold ./evaluation_examples/gold_small.txt --pred ./evaluation_examples/pred_small.txt --etype match --db ./database/ --table tables.json

Would you please give an explanation about this?

Best,
Bo Zheng

@taoyds
Copy link
Owner

taoyds commented Oct 17, 2018

Hi Bo,

For this special case, the evaluation script doesn't take the table name into consideration. This happens only for * (here it appears in count(*) ) since we add * as an additional column for the whole database in the tables.json. We should have added * as an additional column for each table of the database in the tables.json. However, it is too time-consuming for us to modify inputs and code for all baselines and our syntaxSQL model.

As you know, the evaluation script can also provide the execution accuracy which could get this example right.

Best,
Tao

@taoyds
Copy link
Owner

taoyds commented Oct 17, 2018

Hi Bo,

As we pointed out here, The evaluation script doesn't consider the DISTINCT keyword. The reason is that it is very common for people to add DISTINCT in the SQL query even though the corresponding natural language question doesn't contain any clue of having DISTINCT (we found this problem during our annotation). Thus, the evaluation script would not give 0 if the only difference between two SQL queries is DISTINCT.

Best,
Tao

@bozheng-hit
Copy link
Author

Hi Tao,
Since you are running a leaderboard now and the test set is not visible for us, I think it's better to provide a correct evaluation for us. We have no idea how many test data are having the same problem.

Thanks for the quick reply.

Best,
Bo

@taoyds
Copy link
Owner

taoyds commented Oct 25, 2018

Hi Bo,

We updated the evaluation script so that the first problem (count(*)) is fixed. For the DISTINCT case, we think that it is still reasonable to not include it in the evaluation.

Best,
Tao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants