-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
what is the relation between target and cwe in dataset? #7
Comments
According to my understanding: Both vulnerable and non-vulnerable functions come from vulnerability-fixing commits. And vulnerability-fixing commits could have CWE information because they are mined from vulnerability datasets. A non-vulnerable(target == 0) function in dataset may be extracted from (1). the fixed version of a vulnerable function, (2). another function within the file that involves the fix but this function was not changed by that fixing commit. So in summary, there is no relation between "target" and cve/cwe information. Only functions with target == 1 have meaningful cwe information... It appears that diversevul was not preprocessed. I had tried to apply CST parser (specifically tree-sitter) to the functions in diversevul and I got lots of errors. Some functions seem to be truncated. And there are a few of functions ends with "\r" (carriage return) character, which does not make any sense ("\n" or "\r\n" is expected generally)... So I tend to believe that this dataset presentes the raw source code that has not been preprocessed. |
Thanks for your answer! |
This is correct. Thank you! |
I think target means 'label', that 1 corresponds to 'vulnerability'.
In the dataset, some target = 1 data do not with a cwe label,
But other data labeled 0 have cwe id.
The text was updated successfully, but these errors were encountered: