Skip to content

[Umbrella Ticket] Make ML path selection better: improve accuracy, speed up the inference and reduce the size of jar #703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 of 5 tasks
amandelpie opened this issue Aug 10, 2022 · 1 comment
Assignees
Labels
comp-summaries Something related to the method names, code comments and display names generation ctg-documentation Improvements or additions to documentation ctg-enhancement New feature, improvement or change request

Comments

@amandelpie
Copy link
Collaborator

amandelpie commented Aug 10, 2022

This is an umbrella ticket for many sub-tasks

Description

The existing ML path selection is implemented in the utbot-analytics module
It suffers from a few problems:

  1. It uses external ML libraries for the model inference. It brings large size of jar
  2. It uses Smile library for inference (better to use scikit-learn and provide the model importer)
  3. Smile wrapper for blas is used for Matrix multiplication
  4. Kotlin implementation without external runtime is too slow (need our own native implementation of 1-3 operations like matrix mul) - probably multik could help
  5. The DJL inference is too slow
  6. The imported library in JSON/txt format
  7. We measure the metrics on the contest data
  8. The utbot-analytics module de-facto is not used.
  9. There a lot of ML-related settings mixed together with another settings in UtSettings

Expected behavior

  1. utbot-analytics module and its inheritors should be easily enabled/disabled from the intellij/cli modules
  2. Scripts for training should be structured and isolated
  3. Deployed ML models should be a part of jar
  4. No external libraries in the utbot-analytics module
  5. External settings should be extracted to the UtMLSettings
  6. Models are located in resources and packed with the plugin
  7. Models are not larger than 100 KB (zipped or saved in alternative binary format, not json or txt)
  8. utbot-analytics module contains only interfaces and pure Kotlin implementations
  9. utbot contains separate modules for model inference for the custom inference implementations (like DJL)
  10. Different path selectors could be easily compared and results could be displayed as a report
  11. The new metrics of path selection are created
  12. We reached better (significantly) numbers in metrics
  13. Obtained models are ranged and well described
  14. Training process and hyperparameter tuning is well described and published.

Related issues

@korifey korifey moved this to Todo in UTBot Java Aug 10, 2022
@amandelpie amandelpie added ctg-documentation Improvements or additions to documentation ctg-enhancement New feature, improvement or change request labels Aug 10, 2022
@amandelpie amandelpie self-assigned this Aug 17, 2022
@amandelpie amandelpie changed the title Make ML path selection better: improve accuracy, speed up the inference and reduce the size of jar [Umbrella Ticket] Make ML path selection better: improve accuracy, speed up the inference and reduce the size of jar Aug 30, 2022
@amandelpie amandelpie added this to the ML Path Selection. Phase I milestone Sep 14, 2022
@amandelpie
Copy link
Collaborator Author

So it was a cool idea and research that requires a lot of time unfortunately the time was not found and depends on success of custom path selectors

@alisevych alisevych added the comp-summaries Something related to the method names, code comments and display names generation label Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp-summaries Something related to the method names, code comments and display names generation ctg-documentation Improvements or additions to documentation ctg-enhancement New feature, improvement or change request
Projects
Status: Todo
Development

No branches or pull requests

2 participants