Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new CLF #46

Merged
merged 9 commits into from
Aug 22, 2021
Merged

Adding new CLF #46

merged 9 commits into from
Aug 22, 2021

Conversation

Lundez
Copy link
Member

@Lundez Lundez commented Aug 7, 2021

  • Adding CountVectorizer (Sparse, Dense & Hash based variants)
  • Added TfIdfTransform (+ Vectorizer)
  • Added Bm25Transform (+TODO Vectorizer)
  • Updated Embeddings to use multik to 99 % (leaving SVD with EJML)

TODO:

  • Tests
  • Fix docs (medium, still include TODO - text there but not the code examples that are runnable)
  • Extract unused code into issues / new PRs

Next PR:

  • Add simplified usage of common formats (e.g. Penn Treebank for POS)
  • Improve integration with kotlin.dataframe
  • Improve over all API related to ML
  • Add Pipeline
  • Bugg in Multik when using .map on single element with changing type!!
  • Add spaCy like API

@Lundez
Copy link
Member Author

Lundez commented Aug 20, 2021

@denkhan care to take a look?

@Lundez
Copy link
Member Author

Lundez commented Aug 22, 2021

@denkhan should I merge? 🥳

@Lundez Lundez merged commit 919b1e6 into main Aug 22, 2021
@Lundez Lundez deleted the clf branch December 27, 2021 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant