Skip to content

Latest commit

 

History

History
10 lines (5 loc) · 458 Bytes

README.md

File metadata and controls

10 lines (5 loc) · 458 Bytes

hapax_legomena

C++ hapax legomena finder

"Hapax legomena" (Greek for "once said") are words that only appear once in a body of text.

This program can be used to find hapax legomena, as well as dis legomena ("twice said"), tris legomena ("thrice said"), etc, by setting the search depth.

It's fairly fast, can chew thru Moby Dick in a matter of seconds. Right now the regex checking is the slowest part, I hope to make this faster in future versions.