You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When looking at the results of make data in a clean repo clone, it seems there is a small overlap in NL descriptions of the train and test datasets (same for the train and dev). After investigating this issue, it seems that a NL description can have multiple corresponding bash commands, which can get placed in different splits. The code in data/scripts/split_data.py seems to address this in the wrong way. The script checks if identical bash commands are placed in different splits. This would be appropriate when performing Bash2NL but not the other way round.
As the amount of descriptions with multiple commands is not that large, the overlap is not very large, so the performance reported will be only slightly decreased (i guesstimate around 1%, have not tried). But I figured you still might want to be aware of this.
The text was updated successfully, but these errors were encountered:
When looking at the results of
make data
in a clean repo clone, it seems there is a small overlap in NL descriptions of the train and test datasets (same for the train and dev). After investigating this issue, it seems that a NL description can have multiple corresponding bash commands, which can get placed in different splits. The code indata/scripts/split_data.py
seems to address this in the wrong way. The script checks if identical bash commands are placed in different splits. This would be appropriate when performing Bash2NL but not the other way round.As the amount of descriptions with multiple commands is not that large, the overlap is not very large, so the performance reported will be only slightly decreased (i guesstimate around 1%, have not tried). But I figured you still might want to be aware of this.
The text was updated successfully, but these errors were encountered: