Skip to content

tuhh-softsec/Impact-of-Identifier-Normalization-on-Vulnerability-Detection-Techniques

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Impact of Identifier Normalization on Vulnerability Detection Techniques

This study examines the impact of identifier normalization on software vulnerability detection in three approaches: static analysis tools, specialized machine learning (ML) models, and large language models (LLMs). Using the BigVul dataset of vulnerabilities in C/C++ projects, the research evaluates the performance of these methods under normalized (generalized naming of variables/functions) and nonnormalized conditions. Static analysis tools such as Flawfinder and CppCheck exhibit limited effectiveness (F1 scores ~0.1) and are unaffected by normalization. Specialized ML models, such as LineVul, achieve high F1 scores on non-normalized data (F1 ~0.9) but suffer significant performance drops when tested on normalized inputs, highlighting their lack of generalizability. In contrast, LLMs like Llama3, although underperforming in their pretrained state, show substantial improvement after fine-tuning, achieving robust and consistent results across both normalized and non-normalized datasets. The findings suggest that while static analysis tools are less effective, fine-tuned LLMs hold strong potential for scalable and generalized vulnerability detection. The study recommends further exploration of hybrid approaches combining ML models, LLMs, and traditional tools to enhance accuracy and adaptability across diverse scenarios.

Folder structure

Folder name Folder content
fine-tuned Results from the fine-tuned Llama3 models
input Training, validation and test data (compressed due to size limitations)
linevul Results from the trained linevul models
pre-trained Results from a pre-trained Llama3 model
sast Results from CPPCheck and Flawfinder

File naming convention

fine-tuned and linevul

train_<training_dataset>_eval_<evaluation_dataset>.csv

pre-trained

eval_<evaluation_dataset>.csv

sast

<tool_name>_eval_<evaluation_dataset>.csv

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published