Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified reports in the csv file or another comfortable format for analyze #174

Closed
Artanias opened this issue Sep 1, 2023 · 3 comments · Fixed by #176
Closed

Unified reports in the csv file or another comfortable format for analyze #174

Artanias opened this issue Sep 1, 2023 · 3 comments · Fixed by #176
Assignees
Labels
enhancement New feature or request Major Top priority task Medium A task that requires some effort to complete it moevm Внедрение в учебный процесс МОЭВМ
Milestone

Comments

@Artanias
Copy link
Collaborator

Artanias commented Sep 1, 2023

Is your feature request related to a problem? Please describe.
In the current time every work check result product one file and with rising count of check that count may be vast. So first of all we want to unify it into one file and choose more suitable format for humans, instead of use json. Then we want to select required fields for every check result.

Describe the solution you'd like
Use pandas python library that already in used by the codeplag util.

Describe alternatives you've considered
Use one json file instead of use many or search another solution if there will be problems.

@Artanias Artanias added enhancement New feature or request Major Top priority task Medium A task that requires some effort to complete it moevm Внедрение в учебный процесс МОЭВМ labels Sep 1, 2023
@Artanias Artanias added this to the v0.4.0 milestone Sep 1, 2023
@Artanias Artanias self-assigned this Sep 1, 2023
@Artanias
Copy link
Collaborator Author

Artanias commented Sep 1, 2023

Выглядит страшно, но парсится.

>>> import pandas as pd
>>> data = pd.DataFrame({"date": "01/09/2023 19:20:44", "first_path": "https://github.com/Artanias/games/blob/master/Sudoku/Model_sudoku_tf.py", "second_path": "https://github.com/Artanias/games/blob/master/Sudoku/Model_num_tf.py", "first_heads": [["Assign", "Assign", "Assign", "Assign", "Assign", "Assign", "Assign", "Expr", "Expr", "Expr"]], "second_heads": [["Assign", "Assign", "Assign", "Assign", "Assign", "Assign", "Assign", "Assign", "Expr", "Expr", "Expr"]], "jakkar": 1.0, "operators": 0.875, "keywords": 1.0, "literals": 0.9230769230769231, "weighted_average": 0.9632867132867134, "first_modify_date": "2020-07-25T11:52:12Z", "second_modify_date": "2020-08-04T07:40:29Z", "struct_similarity": 0.8804347826086957, "compliance_matrix": [[[[9, 9], [9, 13], [9, 13], [9, 17], [9, 20], [9, 20], [9, 20], [9, 20], [7, 18], [7, 30], [7, 9]], [[9, 13], [13, 13], [13, 13], [11, 19], [12, 21], [12, 21], [12, 21], [12, 21], [9, 20], [11, 30], [7, 13]], [[9, 13], [13, 13], [13, 13], [11, 19], [12, 21], [12, 21], [12, 21], [12, 21], [9, 20], [11, 30], [7, 13]], [[9, 17], [11, 19], [11, 19], [17, 17], [16, 21], [16, 21], [16, 21], [13, 24], [11, 22], [11, 34], [7, 17]], [[9, 20], [12, 21], [12, 21], [16, 21], [20, 20], [20, 20], [20, 20], [13, 27], [13, 23], [16, 32], [7, 20]], [[9, 20], [12, 21], [12, 21], [16, 21], [20, 20], [20, 20], [20, 20], [13, 27], [13, 23], [16, 32], [7, 20]], [[9, 18], [12, 19], [12, 19], [13, 22], [13, 25], [13, 25], [13, 25], [18, 20], [8, 26], [12, 34], [7, 18]], [[7, 18], [9, 20], [9, 20], [11, 22], [13, 23], [13, 23], [13, 23], [9, 27], [16, 16], [14, 30], [7, 16]], [[7, 30], [11, 30], [11, 30], [11, 34], [16, 32], [16, 32], [16, 32], [12, 36], [14, 30], [28, 28], [7, 28]], [[7, 9], [7, 13], [7, 13], [7, 17], [7, 20], [7, 20], [7, 20], [7, 20], [7, 16], [7, 28], [7, 7]]]]})
>>> data.to_csv('data.csv', sep=';')
>>> pd.read_csv('data.csv', sep=';', index_col=0)
                  date                                         first_path  ... struct_similarity                                  compliance_matrix
0  01/09/2023 19:20:44  https://github.com/Artanias/games/blob/master/...  ...          0.880435  [[[9, 9], [9, 13], [9, 13], [9, 17], [9, 20], ...

[1 rows x 14 columns]
>>> pd.read_csv('data.csv', sep=';', index_col=0)['compliance_matrix'][0]
'[[[9, 9], [9, 13], [9, 13], [9, 17], [9, 20], [9, 20], [9, 20], [9, 20], [7, 18], [7, 30], [7, 9]], [[9, 13], [13, 13], [13, 13], [11, 19], [12, 21], [12, 21], [12, 21], [12, 21], [9, 20], [11, 30], [7, 13]], [[9, 13], [13, 13], [13, 13], [11, 19], [12, 21], [12, 21], [12, 21], [12, 21], [9, 20], [11, 30], [7, 13]], [[9, 17], [11, 19], [11, 19], [17, 17], [16, 21], [16, 21], [16, 21], [13, 24], [11, 22], [11, 34], [7, 17]], [[9, 20], [12, 21], [12, 21], [16, 21], [20, 20], [20, 20], [20, 20], [13, 27], [13, 23], [16, 32], [7, 20]], [[9, 20], [12, 21], [12, 21], [16, 21], [20, 20], [20, 20], [20, 20], [13, 27], [13, 23], [16, 32], [7, 20]], [[9, 18], [12, 19], [12, 19], [13, 22], [13, 25], [13, 25], [13, 25], [18, 20], [8, 26], [12, 34], [7, 18]], [[7, 18], [9, 20], [9, 20], [11, 22], [13, 23], [13, 23], [13, 23], [9, 27], [16, 16], [14, 30], [7, 16]], [[7, 30], [11, 30], [11, 30], [11, 34], [16, 32], [16, 32], [16, 32], [12, 36], [14, 30], [28, 28], [7, 28]], [[7, 9], [7, 13], [7, 13], [7, 17], [7, 20], [7, 20], [7, 20], [7, 20], [7, 16], [7, 28], [7, 7]]]'
>>> import json
>>> json.loads(pd.read_csv('data.csv', sep=';', index_col=0)['compliance_matrix'][0])
[[[9, 9], [9, 13], [9, 13], [9, 17], [9, 20], [9, 20], [9, 20], [9, 20], [7, 18], [7, 30], [7, 9]], [[9, 13], [13, 13], [13, 13], [11, 19], [12, 21], [12, 21], [12, 21], [12, 21], [9, 20], [11, 30], [7, 13]], [[9, 13], [13, 13], [13, 13], [11, 19], [12, 21], [12, 21], [12, 21], [12, 21], [9, 20], [11, 30], [7, 13]], [[9, 17], [11, 19], [11, 19], [17, 17], [16, 21], [16, 21], [16, 21], [13, 24], [11, 22], [11, 34], [7, 17]], [[9, 20], [12, 21], [12, 21], [16, 21], [20, 20], [20, 20], [20, 20], [13, 27], [13, 23], [16, 32], [7, 20]], [[9, 20], [12, 21], [12, 21], [16, 21], [20, 20], [20, 20], [20, 20], [13, 27], [13, 23], [16, 32], [7, 20]], [[9, 18], [12, 19], [12, 19], [13, 22], [13, 25], [13, 25], [13, 25], [18, 20], [8, 26], [12, 34], [7, 18]], [[7, 18], [9, 20], [9, 20], [11, 22], [13, 23], [13, 23], [13, 23], [9, 27], [16, 16], [14, 30], [7, 16]], [[7, 30], [11, 30], [11, 30], [11, 34], [16, 32], [16, 32], [16, 32], [12, 36], [14, 30], [28, 28], [7, 28]], [[7, 9], [7, 13], [7, 13], [7, 17], [7, 20], [7, 20], [7, 20], [7, 20], [7, 16], [7, 28], [7, 7]]]

Думаю можно будет похимичить с типами и может не придётся json.loads делать.

Пока есть такие поля:

>>> data.columns
Index(['date', 'first_path', 'second_path', 'first_heads', 'second_heads',
       'jakkar', 'operators', 'keywords', 'literals', 'weighted_average',
       'first_modify_date', 'second_modify_date', 'struct_similarity',
       'compliance_matrix'],
      dtype='object')

date - дата проверка, при кешировании может быть полезно, т.к. есть проблема как раз таки с устаревание данных, может хеш ещё брать.
first_path, second_path - путь до первой работы и второй соответственно либо на системе, либо на GitHub.
jakkar, operators, keywords, literals - результат по быстрым метрикам.
weighted_average - средневзвешенное среднее быстрых метрик.
first_modify_date, second_modify_date - дата коммита с гита, для локальных не реализовано, т.к. можно проверять и без гита и нужна хитрая проверка.
first_heads, second_heads - названия объектов первого уровня первого и второго скрипта соответственно (названия функций), сами по себе бесполезны, полезны вместе с результатом в compliance_matrix.
struct_similarity - результат сравнения структур двух работ.

FYI, @mirrin00, @zmm.

@Artanias
Copy link
Collaborator Author

Artanias commented Sep 3, 2023

При этом способ сохранения в json сейчас не будет во-первых захломлять память, но будет захломлять количество файлов в одной папке, что также плохо, в то время как csv в перспективе будет заполнять память, поэтому csv всё-таки также пока будет промежуточный вариант между чем-то лучше (БД).

Пока сделаю на выбор либо json, либо csv.

@Artanias
Copy link
Collaborator Author

Artanias commented Sep 9, 2023

@mirrin00, @zmm две проверки с пустым кэшом в виде csv файла и заполненным:

root@380c03d96540:/usr/src/codeplag# codeplag --verbose check --extension py --mode one_to_one --directories src/ src/codeplag/ src/webparsers/
[WARNING] 14:37 - Env file not found or not a file. Trying to get token from environment.
[DEBUG] 14:37 - Starting codeplag util ...
[DEBUG] 14:37 - Mode: one_to_one; Extension: py.
[INFO] 14:37 - Starting searching for plagiarism ...
[DEBUG] 14:37 - Getting works features from src
[DEBUG] 14:37 - Getting works features from src/codeplag
[DEBUG] 14:37 - Getting works features from src/webparsers
                                        
++++++++++++++++++++++++++++++++++++++++
May be similar:
src/codeplag/consts.py
src/codeplag/consts.tmp.py


FastMetrics:  JAKKAR  OPERATORS  KEYWORDS  LITERALS  WEIGHTED_AVERAGE
Similarity   100.00%    100.00%   100.00%   100.00%           100.00%

AdditionalMetrics:  Structure
Similarity            100.00%

           AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     78.57%     78.57%    100.00%     73.68%    100.00%     78.57%     93.33%     78.57%     37.14%     78.57%     65.00%     65.00%     65.00%     56.00%     25.49%
AnnAssign     57.89%     57.89%     73.68%    100.00%     73.68%     57.89%     70.00%     57.89%     32.50%     57.89%     52.00%     52.00%     52.00%     62.96%     32.69%
AnnAssign     78.57%     78.57%    100.00%     73.68%    100.00%     78.57%     93.33%     78.57%     37.14%     78.57%     65.00%     65.00%     65.00%     56.00%     25.49%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     73.33%     73.33%     93.33%     70.00%     93.33%     73.33%    100.00%     73.33%     36.11%     73.33%     61.90%     61.90%     61.90%     60.00%     25.00%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     32.35%     32.35%     37.14%     32.50%     37.14%     32.35%     36.11%     32.35%    100.00%     32.35%     55.88%     55.88%     55.88%     51.28%     37.70%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     44.00%     44.00%     56.00%     62.96%     56.00%     44.00%     60.00%     44.00%     51.28%     44.00%     69.23%     69.23%     69.23%    100.00%     50.00%
AnnAssign     22.00%     22.00%     25.49%     32.69%     25.49%     22.00%     25.00%     22.00%     37.70%     22.00%     38.00%     38.00%     38.00%     50.00%    100.00% 

++++++++++++++++++++++++++++++++++++++++
                                        
++++++++++++++++++++++++++++++++++++++++
May be similar:
src/codeplag/consts.py
src/codeplag/consts.tmp.py


FastMetrics:  JAKKAR  OPERATORS  KEYWORDS  LITERALS  WEIGHTED_AVERAGE
Similarity   100.00%    100.00%   100.00%   100.00%           100.00%

AdditionalMetrics:  Structure
Similarity            100.00%

           AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     78.57%     78.57%    100.00%     73.68%    100.00%     78.57%     93.33%     78.57%     37.14%     78.57%     65.00%     65.00%     65.00%     56.00%     25.49%
AnnAssign     57.89%     57.89%     73.68%    100.00%     73.68%     57.89%     70.00%     57.89%     32.50%     57.89%     52.00%     52.00%     52.00%     62.96%     32.69%
AnnAssign     78.57%     78.57%    100.00%     73.68%    100.00%     78.57%     93.33%     78.57%     37.14%     78.57%     65.00%     65.00%     65.00%     56.00%     25.49%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     73.33%     73.33%     93.33%     70.00%     93.33%     73.33%    100.00%     73.33%     36.11%     73.33%     61.90%     61.90%     61.90%     60.00%     25.00%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     32.35%     32.35%     37.14%     32.50%     37.14%     32.35%     36.11%     32.35%    100.00%     32.35%     55.88%     55.88%     55.88%     51.28%     37.70%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     44.00%     44.00%     56.00%     62.96%     56.00%     44.00%     60.00%     44.00%     51.28%     44.00%     69.23%     69.23%     69.23%    100.00%     50.00%
AnnAssign     22.00%     22.00%     25.49%     32.69%     25.49%     22.00%     25.00%     22.00%     37.70%     22.00%     38.00%     38.00%     38.00%     50.00%    100.00% 

++++++++++++++++++++++++++++++++++++++++
                                        
++++++++++++++++++++++++++++++++++++++++
May be similar:
src/webparsers/async_github_parser.py
src/webparsers/github_parser.py


FastMetrics:  JAKKAR  OPERATORS  KEYWORDS  LITERALS  WEIGHTED_AVERAGE
Similarity    73.53%     80.17%    67.57%    79.41%            74.72%

AdditionalMetrics:  Structure
Similarity             55.83%

++++++++++++++++++++++++++++++++++++++++
                                        
++++++++++++++++++++++++++++++++++++++++
May be similar:
src/webparsers/async_github_parser.py
src/webparsers/github_parser.py


FastMetrics:  JAKKAR  OPERATORS  KEYWORDS  LITERALS  WEIGHTED_AVERAGE
Similarity    73.53%     80.17%    67.57%    79.41%            74.72%

AdditionalMetrics:  Structure
Similarity             55.83%

++++++++++++++++++++++++++++++++++++++++
[DEBUG] 14:38 - Time for all 6.30 s
[INFO] 14:38 - Ending searching for plagiarism ...
[DEBUG] 14:38 - Saving report to the file '/usr/src/codeplag/reports/codeplag_report.csv'
root@380c03d96540:/usr/src/codeplag# codeplag --verbose check --extension py --mode one_to_one --directories src/ src/codeplag/ src/webparsers/
[WARNING] 14:38 - Env file not found or not a file. Trying to get token from environment.
[DEBUG] 14:38 - Starting codeplag util ...
[DEBUG] 14:38 - Mode: one_to_one; Extension: py.
[INFO] 14:38 - Starting searching for plagiarism ...
[DEBUG] 14:38 - Getting works features from src
[DEBUG] 14:38 - Getting works features from src/codeplag
[DEBUG] 14:38 - Getting works features from src/webparsers
                                        
++++++++++++++++++++++++++++++++++++++++
May be similar:
src/codeplag/consts.py
src/codeplag/consts.tmp.py


FastMetrics:  JAKKAR  OPERATORS  KEYWORDS  LITERALS  WEIGHTED_AVERAGE
Similarity   100.00%    100.00%   100.00%   100.00%           100.00%

AdditionalMetrics:  Structure
Similarity            100.00%

           AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     78.57%     78.57%    100.00%     73.68%    100.00%     78.57%     93.33%     78.57%     37.14%     78.57%     65.00%     65.00%     65.00%     56.00%     25.49%
AnnAssign     57.89%     57.89%     73.68%    100.00%     73.68%     57.89%     70.00%     57.89%     32.50%     57.89%     52.00%     52.00%     52.00%     62.96%     32.69%
AnnAssign     78.57%     78.57%    100.00%     73.68%    100.00%     78.57%     93.33%     78.57%     37.14%     78.57%     65.00%     65.00%     65.00%     56.00%     25.49%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     73.33%     73.33%     93.33%     70.00%     93.33%     73.33%    100.00%     73.33%     36.11%     73.33%     61.90%     61.90%     61.90%     60.00%     25.00%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     32.35%     32.35%     37.14%     32.50%     37.14%     32.35%     36.11%     32.35%    100.00%     32.35%     55.88%     55.88%     55.88%     51.28%     37.70%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     44.00%     44.00%     56.00%     62.96%     56.00%     44.00%     60.00%     44.00%     51.28%     44.00%     69.23%     69.23%     69.23%    100.00%     50.00%
AnnAssign     22.00%     22.00%     25.49%     32.69%     25.49%     22.00%     25.00%     22.00%     37.70%     22.00%     38.00%     38.00%     38.00%     50.00%    100.00% 

++++++++++++++++++++++++++++++++++++++++
                                        
++++++++++++++++++++++++++++++++++++++++
May be similar:
src/codeplag/consts.py
src/codeplag/consts.tmp.py


FastMetrics:  JAKKAR  OPERATORS  KEYWORDS  LITERALS  WEIGHTED_AVERAGE
Similarity   100.00%    100.00%   100.00%   100.00%           100.00%

AdditionalMetrics:  Structure
Similarity            100.00%

           AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     78.57%     78.57%    100.00%     73.68%    100.00%     78.57%     93.33%     78.57%     37.14%     78.57%     65.00%     65.00%     65.00%     56.00%     25.49%
AnnAssign     57.89%     57.89%     73.68%    100.00%     73.68%     57.89%     70.00%     57.89%     32.50%     57.89%     52.00%     52.00%     52.00%     62.96%     32.69%
AnnAssign     78.57%     78.57%    100.00%     73.68%    100.00%     78.57%     93.33%     78.57%     37.14%     78.57%     65.00%     65.00%     65.00%     56.00%     25.49%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     73.33%     73.33%     93.33%     70.00%     93.33%     73.33%    100.00%     73.33%     36.11%     73.33%     61.90%     61.90%     61.90%     60.00%     25.00%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     32.35%     32.35%     37.14%     32.50%     37.14%     32.35%     36.11%     32.35%    100.00%     32.35%     55.88%     55.88%     55.88%     51.28%     37.70%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     44.00%     44.00%     56.00%     62.96%     56.00%     44.00%     60.00%     44.00%     51.28%     44.00%     69.23%     69.23%     69.23%    100.00%     50.00%
AnnAssign     22.00%     22.00%     25.49%     32.69%     25.49%     22.00%     25.00%     22.00%     37.70%     22.00%     38.00%     38.00%     38.00%     50.00%    100.00% 

++++++++++++++++++++++++++++++++++++++++
                                        
++++++++++++++++++++++++++++++++++++++++
May be similar:
src/webparsers/async_github_parser.py
src/webparsers/github_parser.py


FastMetrics:  JAKKAR  OPERATORS  KEYWORDS  LITERALS  WEIGHTED_AVERAGE
Similarity    73.53%     80.17%    67.57%    79.41%            74.72%

AdditionalMetrics:  Structure
Similarity             55.83%

++++++++++++++++++++++++++++++++++++++++
                                        
++++++++++++++++++++++++++++++++++++++++
May be similar:
src/webparsers/async_github_parser.py
src/webparsers/github_parser.py


FastMetrics:  JAKKAR  OPERATORS  KEYWORDS  LITERALS  WEIGHTED_AVERAGE
Similarity    73.53%     80.17%    67.57%    79.41%            74.72%

AdditionalMetrics:  Structure
Similarity             55.83%

++++++++++++++++++++++++++++++++++++++++
[DEBUG] 14:38 - Time for all 0.66 s
[INFO] 14:38 - Ending searching for plagiarism ...
[DEBUG] 14:38 - Nothing new to save to the csv report.
  1. Как можно увидеть увидеть как минимум время на повторные проверки сильно сократилось для такого простого примера, а получение самой информации из файлов, что вроде тоже хочется оптимизировать не так много времени занимает, даже есть соответствующий рисёрч https://github.com/OSLL/code-plagiarism/blob/main/docs/notebooks/time_survey.py.ipynb. На код в 1000 полезных строк 0.14 секунд в среднем уходит для вычленения информации.
  2. Та самая опция и mode one_to_one позволяет добиться как раз тех целей, что были запрошены для вызова из папки с папками, но тут в выводе единственной в ветке поправлено, что работы сами с собой не проверяются а также в stout немного ещё дублирование идёт, но в csv такого не будет.
root@380c03d96540:/usr/src/codeplag# batcat reports/codeplag_report.csv 
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: reports/codeplag_report.csv
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ ;date;first_modify_date;second_modify_date;first_path;second_path;first_heads;second_heads;jakkar;operators;keywords;literals;weighted_average;struct_similarity;compliance_matrix
   2   │ 0;09/09/2023 14:37:55;;;src/codeplag/consts.py;src/codeplag/consts.tmp.py;['AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAss
       │ ign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign'];['AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign
       │ ', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign'];1.0;1.0;1.0;1.0;1.0;1.0;[[[11, 11], [11, 11], [11, 14], [11, 19], [11, 14], [11, 11], [11, 15], [11, 11], [11, 34], [11, 11], [11, 
       │ 19], [11, 19], [11, 19], [11, 25], [11, 50]], [[11, 11], [11, 11], [11, 14], [11, 19], [11, 14], [11, 11], [11, 15], [11, 11], [11, 34], [11, 11], [11, 19], [11, 19], [11, 19], [11, 25], [11, 50]], [
       │ [11, 14], [11, 14], [14, 14], [14, 19], [14, 14], [11, 14], [14, 15], [11, 14], [13, 35], [11, 14], [13, 20], [13, 20], [13, 20], [14, 25], [13, 51]], [[11, 19], [11, 19], [14, 19], [19, 19], [14, 19
       │ ], [11, 19], [14, 20], [11, 19], [13, 40], [11, 19], [13, 25], [13, 25], [13, 25], [17, 27], [17, 52]], [[11, 14], [11, 14], [14, 14], [14, 19], [14, 14], [11, 14], [14, 15], [11, 14], [13, 35], [11,
       │  14], [13, 20], [13, 20], [13, 20], [14, 25], [13, 51]], [[11, 11], [11, 11], [11, 14], [11, 19], [11, 14], [11, 11], [11, 15], [11, 11], [11, 34], [11, 11], [11, 19], [11, 19], [11, 19], [11, 25], [
       │ 11, 50]], [[11, 15], [11, 15], [14, 15], [14, 20], [14, 15], [11, 15], [15, 15], [11, 15], [13, 36], [11, 15], [13, 21], [13, 21], [13, 21], [15, 25], [13, 52]], [[11, 11], [11, 11], [11, 14], [11, 1
       │ 9], [11, 14], [11, 11], [11, 15], [11, 11], [11, 34], [11, 11], [11, 19], [11, 19], [11, 19], [11, 25], [11, 50]], [[11, 34], [11, 34], [13, 35], [13, 40], [13, 35], [11, 34], [13, 36], [11, 34], [34
       │ , 34], [11, 34], [19, 34], [19, 34], [19, 34], [20, 39], [23, 61]], [[11, 11], [11, 11], [11, 14], [11, 19], [11, 14], [11, 11], [11, 15], [11, 11], [11, 34], [11, 11], [11, 19], [11, 19], [11, 19], 
       │ [11, 25], [11, 50]], [[11, 19], [11, 19], [13, 20], [13, 25], [13, 20], [11, 19], [13, 21], [11, 19], [19, 34], [11, 19], [19, 19], [19, 19], [19, 19], [18, 26], [19, 50]], [[11, 19], [11, 19], [13, 
       │ 20], [13, 25], [13, 20], [11, 19], [13, 21], [11, 19], [19, 34], [11, 19], [19, 19], [19, 19], [19, 19], [18, 26], [19, 50]], [[11, 19], [11, 19], [13, 20], [13, 25], [13, 20], [11, 19], [13, 21], [1
       │ 1, 19], [19, 34], [11, 19], [19, 19], [19, 19], [19, 19], [18, 26], [19, 50]], [[11, 25], [11, 25], [14, 25], [17, 27], [14, 25], [11, 25], [15, 25], [11, 25], [20, 39], [11, 25], [18, 26], [18, 26],
       │  [18, 26], [25, 25], [25, 50]], [[11, 50], [11, 50], [13, 51], [17, 52], [13, 51], [11, 50], [13, 52], [11, 50], [23, 61], [11, 50], [19, 50], [19, 50], [19, 50], [25, 50], [50, 50]]]
   3   │ 1;09/09/2023 14:38:01;;;src/webparsers/async_github_parser.py;src/webparsers/github_parser.py;['AnnAssign', 'AsyncGithubParser'];['AnnAssign', 'AnnAssign', 'GitHubParser'];0.7352941176470589;0.801652
       │ 8925619835;0.6756756756756757;0.7941176470588235;0.7472148198934783;0.5582665695557174;[[[11, 11], [11, 11], [11, 2210]], [[8, 2037], [8, 2037], [1521, 2723]]]
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

@Artanias Artanias linked a pull request Sep 10, 2023 that will close this issue
@Artanias Artanias moved this to All Completed in Codeplag tasks (CP) Jul 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Major Top priority task Medium A task that requires some effort to complete it moevm Внедрение в учебный процесс МОЭВМ
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant