You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is primarily meant as a braindump/storm to collect ideas and remarks. Feel free to add your own!
Problem
Most programming platforms allow students to submit more than once for a given exercise. If that is the case, students who hand in plagiarized submissions will sometimes copy another students' solution as-is to confirm it is correct, and afterwards submit an altered version to hide plagiarism.
To discover this kind of plagiarism, you can submit all submissions for a single exercise. However, Dolos currently considers each file its own submission and will match files from the same student together. Since these submissions are often very similar, they will create high-similarity pairs that will drown out pairs between different students - reducing the effectiveness of the report.
Solution
First, there needs to be a way to communicate to Dolos which submissions belong to the same student, I see two different methods:
If the paths given as argument to the CLI are directories, all submissions within the same directory could be considered from the same student
When receiving input from an input.csv, a field student_id could tell Dolos which submissions belong together. As this is the output of Dodona's export format, there is no need for Dodona to change anything to support this.
Second, the Dolos algorithm should probably ignore matches between submissions of the same student and not generate Pairs between
However, it could be interesting to be able to view the changes a student made between subsequent submissions as well.
Finally, careful consideration is needed how to implement this in the UI / CSV generation:
A Pair could become the "most similar pair" between two students' submissions. However, this could differ between each pair of students.
There should be a way to go through the individual submissions of each student and compare them seperately. Often there is useful information contained in previous/later submissions than the most similar one.
Sidenote
Currently at Aalto they are using labels to group students together. While this works surprisingly well, it does give some issues in the UI (e.g. a plagiarism graph that is very long).
The text was updated successfully, but these errors were encountered:
This issue is primarily meant as a braindump/storm to collect ideas and remarks. Feel free to add your own!
Problem
Most programming platforms allow students to submit more than once for a given exercise. If that is the case, students who hand in plagiarized submissions will sometimes copy another students' solution as-is to confirm it is correct, and afterwards submit an altered version to hide plagiarism.
To discover this kind of plagiarism, you can submit all submissions for a single exercise. However, Dolos currently considers each file its own submission and will match files from the same student together. Since these submissions are often very similar, they will create high-similarity pairs that will drown out pairs between different students - reducing the effectiveness of the report.
Solution
paths
given as argument to the CLI are directories, all submissions within the same directory could be considered from the same studentinput.csv
, a fieldstudent_id
could tell Dolos which submissions belong together. As this is the output of Dodona's export format, there is no need for Dodona to change anything to support this.Pairs
betweenPair
could become the "most similar pair" between two students' submissions. However, this could differ between each pair of students.Sidenote
Currently at Aalto they are using labels to group students together. While this works surprisingly well, it does give some issues in the UI (e.g. a plagiarism graph that is very long).
The text was updated successfully, but these errors were encountered: