Skip to content

Meteor Package: Yaki can capture relevant tags from any bunch of text.

License

Notifications You must be signed in to change notification settings

rashinari/yaki

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Yaki

Yaki can capture relevant tags from any bunch of text. Works on the client and on the server.

Features from Yaki:

  • Uses term normalizations to construct a list of terms
  • Uses Stopword Lists and a language dependent alphabet as dictionaries
  • Calculates tag relevance via statisitcal methods: like entropy and standard normal distribution
  • Uses n-Gram for stemming and simmilarity detection
  • Can find word combinations (in case of multiple occurences)
  • Currently supported languages: english and german
  • Uses language dependent feature configurations to improve QoS

Text Retrieval classification: morphology and parts of syntax (without vocabulary)

Beware: This is an early alpha test release and NOT suitable for production.

Installation

  $ meteor add nefiltari:yaki

How-To

For simple tagging (most features are activated by default) use following syntax:

  text = "This is a sample text to demonstrate the tagging engine `Yaki`."
  console.log Yaki(text).extract()
  # -> [ 'demonstrate', 'yaki', 'engine', 'tagging' ]

If you know the language then you can specify this as second parameter (use the Top Level Domain abbreviation). The default language is english. Use additional (maybe) known tags to add a stronger weight to some words.

  text = "Dieser Beispieltext demonstriert das Tagging von Yaki in deutscher Sprache."
  console.log Yaki(text, {language: 'de', tags: ['yaki']}).extract()
  # -> [ 'yaki', 'demonstriert', 'beispieltext', 'deutscher', 'sprache' ]

You can normalize and clean() an array of words, fragments or tags with Yaki.

  fragments = ['(legend)', 'advanced.', 'MultiColor', '-> HTTP <-']
  console.log Yaki(fragments).clean()
  # -> [ 'legend', 'advanced', 'multicolor', 'http' ]

ToDo

  • Instead of transferring the heavy stopword-lists to the client, proxy client requests through a server method
  • Improve the algorithm to find multi-word phrases instead of just single words
  • Refactor the source code to improve readability and performance even further
  • Add more test cases to ensure quality and enable better collaboration

License

This code is licenced under the LGPL 3.0. Do whatever you want with this code, but I'd like to get improvements and bugfixes back.

About

Meteor Package: Yaki can capture relevant tags from any bunch of text.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • CoffeeScript 94.8%
  • JavaScript 5.2%