This study examines the impact of identifier normalization on software vulnerability detection in three approaches: static analysis tools, specialized machine learning (ML) models, and large language models (LLMs). Using the BigVul dataset of vulnerabilities in C/C++ projects, the research evaluates the performance of these methods under normalized (generalized naming of variables/functions) and nonnormalized conditions. Static analysis tools such as Flawfinder and CppCheck exhibit limited effectiveness (F1 scores ~0.1) and are unaffected by normalization. Specialized ML models, such as LineVul, achieve high F1 scores on non-normalized data (F1 ~0.9) but suffer significant performance drops when tested on normalized inputs, highlighting their lack of generalizability. In contrast, LLMs like Llama3, although underperforming in their pretrained state, show substantial improvement after fine-tuning, achieving robust and consistent results across both normalized and non-normalized datasets. The findings suggest that while static analysis tools are less effective, fine-tuned LLMs hold strong potential for scalable and generalized vulnerability detection. The study recommends further exploration of hybrid approaches combining ML models, LLMs, and traditional tools to enhance accuracy and adaptability across diverse scenarios.
Folder name | Folder content |
---|---|
fine-tuned | Results from the fine-tuned Llama3 models |
input | Training, validation and test data (compressed due to size limitations) |
linevul | Results from the trained linevul models |
pre-trained | Results from a pre-trained Llama3 model |
sast | Results from CPPCheck and Flawfinder |
train_<training_dataset>_eval_<evaluation_dataset>.csv
eval_<evaluation_dataset>.csv
<tool_name>_eval_<evaluation_dataset>.csv