Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Classification based on subject and preview text #7918

Closed
wants to merge 5 commits into from

Conversation

st3iny
Copy link
Member

@st3iny st3iny commented Jan 18, 2023

Highly experimental!

@st3iny st3iny self-assigned this Jan 18, 2023
@ChristophWurst
Copy link
Member

Highly experimental!

:shipit:

@ChristophWurst
Copy link
Member

Test 1

[debug] found 12 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 2556 messages of which 504 are important
[debug] data set split into 1917 (i: 231) training and 639 (i: 273) validation sets with 1004 dimensions
[debug] classification report: {"recall":0.7216117216117216,"precision":0.7216117216117216,"f1Score":0.7216117216117216}
[debug] classifier validated: recall(important)=0.72161172161172, precision(important)=0.72161172161172 f1(important)=0.72161172161172

Test 2

[debug] found 12 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 4000 messages of which 1059 are important
[debug] data set split into 3000 (i: 500) training and 1000 (i: 559) validation sets with 1004 dimensions
[debug] classification report: {"recall":0.76207513416815742,"precision":0.60597439544807963,"f1Score":0.67511885895404111}
[debug] classifier validated: recall(important)=0.76207513416816, precision(important)=0.60597439544808 f1(important)=0.67511885895404

Test 3 with shorter word count vector = 50

[debug] found 12 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 4000 messages of which 1059 are important
[debug] data set split into 3000 (i: 500) training and 1000 (i: 559) validation sets with 54 dimensions
[debug] classification report: {"recall":0.77996422182468694,"precision":0.58681022880215339,"f1Score":0.66973886328725041}
[debug] classifier validated: recall(important)=0.77996422182469, precision(important)=0.58681022880215 f1(important)=0.66973886328725

Test 4 with wcv = 10

[debug] found 12 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 4000 messages of which 1059 are important
[debug] data set split into 3000 (i: 500) training and 1000 (i: 559) validation sets with 14 dimensions
[debug] classification report: {"recall":0.6636851520572451,"precision":0.6081967213114754,"f1Score":0.6347305389221557}
[debug] classifier validated: recall(important)=0.66368515205725, precision(important)=0.60819672131148 f1(important)=0.63473053892216

@st3iny st3iny force-pushed the enh/noid/classification-based-on-subject branch from c33634b to c201168 Compare January 24, 2023 08:49
@st3iny
Copy link
Member Author

st3iny commented Mar 21, 2023

Closing because this PR is outdated.

@st3iny st3iny closed this Mar 21, 2023
@ChristophWurst ChristophWurst deleted the enh/noid/classification-based-on-subject branch March 22, 2023 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants