Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch processing in training of NN ensemble - base project suggest calls #676

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

juhoinkinen
Copy link
Member

@juhoinkinen juhoinkinen commented Feb 23, 2023

This PR experiments with implementing batched suggest calls for the base projects in NN ensemble backend.

Unfortunately there is no notable performance gain in real use, at least with MLLM, fastText, and Omikuji base projects (as in YSO projects of Finto AI), but actually a performance regression. Performance gain is seen when using only Omikuji as the base project, which is the only one of the backends in Finto AI YSO base models having the batch suggest method implemented.

Below results are from for runs at kj-kk using 16 jobs training on corpora/fulltext-train/fi/*/.

MLLM, fastText, and Omikuji base projects

1000 docs, 1 epoch

user time wall time max rss
before (master) 1268.63 2:25.07 14863072
after (PR) 1260.37 2:24.00 14791464

2000 docs, 10 epochs

user time wall time max rss
before (master) 4205.64 4:41.01 15718876
after (PR) 4152.09 4:58.89 15712064

Omikuji base project only

2000 docs, 1 epoch

user time wall time max rss
before (master) 511.38 1:23.13 7827672
after (PR) 336.72 1:13.68 7612384

@codecov
Copy link

codecov bot commented Feb 23, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change

Comparison is base (f280342) 99.57% compared to head (38c6784) 99.57%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #676   +/-   ##
=======================================
  Coverage   99.57%   99.57%           
=======================================
  Files          87       87           
  Lines        6157     6164    +7     
=======================================
+ Hits         6131     6138    +7     
  Misses         26       26           
Impacted Files Coverage Δ
annif/parallel.py 100.00% <ø> (ø)
annif/backend/nn_ensemble.py 100.00% <100.00%> (ø)
tests/test_backend_nn_ensemble.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@sonarcloud
Copy link

sonarcloud bot commented Mar 8, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant