-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

The Handbook of NLP with Gensim
By :

The previous section arbitrarily set the number of topics as 20. Is this the optimal number of topics? To investigate this, we need to understand the “scope” of a topic. A topic can have a set of words that are loosely connected, or closely connected. The latter is a distinctive topic, but the former is not distinctive enough. In other words, the “closeness” of words in a topic is an important measure. If a topic has words that are very loosely connected, the topic may be better separated into more than one.
In order to measure the “closeness” of a topic, Röder, Both, and Hinneburg (2015) [5] proposed a metric called the coherence score. The score is defined as the average or median of pairwise word similarities, formed by the top words of a given topic. The value of a coherence score itself doesn’t have a universal meaning because it varies, based on the scoring...