4371 shaares
A tutu on how to count hapaxes (words which occur only once in a text or corpus) using NLTK.
Some alternatives mentioned:
- Pattern : Python package for datamining the WWW which includes submodules for language processing and machine learning
- Polyglot : language library focusing on "massive multilingual applications"
- spaCy : an "industrial strength" NLP library focused on performance with a streamlined API