You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+95-4
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,8 @@ Implementation of Devign Model in Python with code for processing the dataset an
18
18
* [Create Task](#create-task)
19
19
* [Embed Task](#embed-task)
20
20
* [Process Task](#process-task)
21
-
*[Roadmap](#Roadmap)
21
+
*[Results](#results)
22
+
*[Roadmap](#roadmap)
22
23
*[License](#license)
23
24
*[Contact](#contact)
24
25
*[Acknowledgements](#acknowledgements)
@@ -59,6 +60,7 @@ That can be done by changing the ```"slice_size"``` value under ```"create"``` i
59
60
needs to match ```"in_channels"```, under ```"devign" -> "model" -> "conv_args" -> "conv1d_1"```.
60
61
* The embedding size is equal to Word2Vec vector size plus 1.
61
62
* When executing the **Create** task, a directory named ```joern``` is created and deleted automatically under ```'project'\data\```.
63
+
* The dataset split for modeling during **Process** task is done under ```src/data/datamanger.py```. The sets are balanced and the train/val/test ratio are 0.8/0.1/0.1 respectively.
62
64
### Setup
63
65
64
66
---
@@ -163,8 +165,8 @@ The dataset used is the [partial dataset](https://sites.google.com/view/devign)
163
165
The dataset is handled with Pandas and the file ```src/data/datamanger.py``` contains wrapper functions for the most essential operations.
164
166
<br/>
165
167
<br/>
166
-
A small sample from the original dataset is available for testing purposes.
167
-
The sample dataset contains functions from the **FFmpeg** project with maximum of 287 nodes per function.
168
+
A small sample of 994 entries from the original dataset is available for testing purposes.
169
+
The sample dataset contains functions from the **FFmpeg** project with a maximum of 287 nodes per function.
168
170
For each task, the necessary dataset files are available under the respective folders.
169
171
<br/>
170
172
<br/>
@@ -229,11 +231,53 @@ for the initial embeddings. The nodes embeddings are done as explained in the pa
0 commit comments