Batch processing in training of NN ensemble - base project suggest calls #676

juhoinkinen · 2023-02-23T11:18:29Z

This PR experiments with implementing batched suggest calls for the base projects in NN ensemble backend.

Unfortunately there is no notable performance gain in real use, at least with MLLM, fastText, and Omikuji base projects (as in YSO projects of Finto AI), but actually a performance regression. Performance gain is seen when using only Omikuji as the base project, which is the only one of the backends in Finto AI YSO base models having the batch suggest method implemented.

Below results are from for runs at kj-kk using 16 jobs training on corpora/fulltext-train/fi/*/.

MLLM, fastText, and Omikuji base projects

1000 docs, 1 epoch

	user time	wall time	max rss
before (master)	1268.63	2:25.07	14863072
after (PR)	1260.37	2:24.00	14791464

2000 docs, 10 epochs

	user time	wall time	max rss
before (master)	4205.64	4:41.01	15718876
after (PR)	4152.09	4:58.89	15712064

Omikuji base project only

2000 docs, 1 epoch

	user time	wall time	max rss
before (master)	511.38	1:23.13	7827672
after (PR)	336.72	1:13.68	7612384

codecov · 2023-02-23T11:24:22Z

Codecov Report

Patch coverage: 100.00% and no project coverage change

Comparison is base (f280342) 99.57% compared to head (38c6784) 99.57%.

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #676   +/-   ##
=======================================
  Coverage   99.57%   99.57%           
=======================================
  Files          87       87           
  Lines        6157     6164    +7     
=======================================
+ Hits         6131     6138    +7     
  Misses         26       26

Impacted Files	Coverage Δ
annif/parallel.py	`100.00% <ø> (ø)`
annif/backend/nn_ensemble.py	`100.00% <100.00%> (ø)`
tests/test_backend_nn_ensemble.py	`100.00% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

sonarcloud · 2023-03-08T12:23:39Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
1 Code Smell

No Coverage information
0.0% Duplication

juhoinkinen added 2 commits February 22, 2023 18:19

Add test for training NN ensemble with 2 base projects

c31885e

Utilize batching in suggestions from base projects

7f762c2

juhoinkinen added 2 commits March 8, 2023 14:19

Merge branch 'master' into batching-in-nn-ensemble

1dfc331

Remove now unused single-doc suggest method in parallel.py

38c6784

This was referenced Apr 14, 2023

Refactor: Represent suggestion results as sparse arrays #681

Merged

Better support for suggestion batches in NN ensemble #687

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch processing in training of NN ensemble - base project suggest calls #676

Batch processing in training of NN ensemble - base project suggest calls #676

juhoinkinen commented Feb 23, 2023 •

edited

Loading

codecov bot commented Feb 23, 2023 •

edited

Loading

sonarcloud bot commented Mar 8, 2023

Batch processing in training of NN ensemble - base project suggest calls #676

Are you sure you want to change the base?

Batch processing in training of NN ensemble - base project suggest calls #676

Conversation

juhoinkinen commented Feb 23, 2023 • edited Loading

MLLM, fastText, and Omikuji base projects

1000 docs, 1 epoch

2000 docs, 10 epochs

Omikuji base project only

2000 docs, 1 epoch

codecov bot commented Feb 23, 2023 • edited Loading

Codecov Report

sonarcloud bot commented Mar 8, 2023

juhoinkinen commented Feb 23, 2023 •

edited

Loading

codecov bot commented Feb 23, 2023 •

edited

Loading