- Updated some BM25Vectorizer methods types according to implementation — thanks to @pavloDeshko ✅
- Detokenization now restores em/en, third/quarter, thin/hair, medium math space characters & narrow non breaking space characters besides the regular nbsp. 👏 🙌 🛰️
.contextualVectors()
now throws error if (a) word vectors are not loaded and (b) withlemma: true
, "pos" is missing in the NLP pipe. 🤓
- Refined typescript definitions further. ✅
- Added missing typescript definitions for word embeddings besides few other typescript fixes. ✅
- Detokenization restores both regular and non-breaking spaces to their original positions. 🤓
- You can now use
similarity.vector.cosine( vectorA, vectorB )
to compute similarity between two vectors on a scale of 0 to 1. 🤓
- Seamless word embedding integration enhances winkNLP's semantic capabilities. 🎉 👏 🙌
- Pre-trained 100-dimensional word embeddings for over 350,000 English words released: wink-embeddings-sg-100d. 💯
- API remains unchanged — no code updates needed for existing projects. The new APIs include: 🤩
- Obtain vector for a token: Use the
.vectorOf( token )
API. - Compute sentence/document embeddings: Employ the
as.vector
helper: use.out( its.lemma, as.vector )
on tokens of a sentence or document. You can also useits.value
orits.normal
. Tokens can be pre-processed to remove stop words etc using the.filter()
API. Note, theas.vector
helper uses averaging technique. - Generate contextual vectors: Leverage the
.contextualVectors()
method on a document. Useful for pure browser-side applications! Generate custom vectors contextually relevant to your corpus and use them in place of larger pre-trained wink embeddings.
- Obtain vector for a token: Use the
- Comprehensive documentation along with interesting examples is coming up shortly. Stay tuned for updates! 😎
- Added a live example for how to run winkNLP on Deno. 👍
- Paramteters in
markup()
are optional now in TS code — squashed a typescript declaration bug. 🙌
- Fixed a typescript declaration. ✅
- You can now use
its.sentenceWiseImprotance
helper to obtain sentence wise importance (on a scale of 0 to 1) of a document, if it is supported by language model. 📚📊🤓 - Checkout live example How to visualize key sentences in a document? 👀
- Some behind the scene model improvements. 😎 🤓
- Add clarity on typescript configuration in README. ✅
- Mark allows marking w.r.t. the last element of the pattern. For example if a pattern matches
a fluffy cat
thenmark: [-2, -1]
will extractfluffy cat
— especially useful when the match length is unknown. 💃 - Improved error handling while processing mark's arguments. 🙌
- README is now more informative and links to examples and benchmarks 👍
- Benchmarked on latest machine, browser versions 🖥
- Fixed incorrect install command in README ✅
- Ready for future — we have tested winkNLP on Node.js version 18 including its models. 🙌 🎉
- winkNLP earned Open Source Security Foundation (OpenSSF) Best Practices passing badge. 🎉 👏 🙌
.bowOf()
api of BM25Vectorizer now supports processing of OOV tokens — useful for cosine similarity computation. 😎- Document has a new API —
.pipeConfig()
to inquire the active processing pipeline.
- Obtain bag-of-words for a tokenized text from BM25Vectorizer using
.bowOf()
api — useful for bow based similarity computation. 👍 learnCustomEntities()
displays a console warning, if a complex short hand pattern is likely to cause learning/execution slow down.🤞❗️
- Easily load BM25Vectorizer's model using newly introduced
.loadModel()
api. 🎉
- We have enhanced typescript support to allow easy addition of new typescript enabled language models. 👏
- Added naive wikification showcase in README. 😎
- Included NLP Pipe details in the README file. 🤓
- We have added support for Typescript. 🙌🎉
- Some behind the scene updates & fixes. 😎🤓
- Improved documentation. 📚🤓
- Now supported similarity methods are cosine for bag of words, tversky & Otsuka-Ochiai (oo) for set. 🙌
- Obtain JS set via
as.set
helper. 😇
- No need to run the entire annotation pipeline, now you can select whatever you want or just even run tokenization by specifying an empty pipe. 🤩🎉
- Exposed
its
andas
helpers via the instance of winkNLP as well. 🤓
- Cosine similarity is available on Bag of Words. 🛍🔡🎉
- You can now use
its.readabilityStats
helper to obtain document's readability statistics, if it is supported by language model. 📚📊🤓
- Now use
its.lemma
helper to obtain lemma of words. 👏 🎉
- We have added support for browser ready language model. 🤩 🎉
- Now easily vectorize text using bm25-based vectroizer. 🤓 👏
- Examples in README now runs on RunKit using web model! ✅
- We have enabled add-ons to support enhanced language models, paving way for new
its
helpers. 🎉 - Now use
its.stem
helper to obtain stems of the words using Porter Stemmer Algorithm V2. 👏
- Benchmarked on Node.js v12 & v14 also and updated the speed to minimum observed. 🏃♀️
- Happy to release version 1.0.0 for you! 💫👏
- You can optionally include custom entity detection while running speed benchmark. 😇
- Getting ready to move to version 1.0.0 — almost there! 💫
- Some behind the scene updates to test cases. 😎
- Updated the version of English light language model to the latest — 0.3.0. 🙌
- No need to remember or copy/paste long Github url for language model installation. The new script installs the latest version for you automatically. 🎉
- We have added
.parentCustomEntity()
API to.tokens()
API. 👏
- Accessing custom entities was failing whenever there were no custom entities. Now things are as they should be — it tells you that there are none! ✅
- We have improved interface with the language model — now supports the new format. 👍