Skip to main content

Doc2vec

2017


Semi-supervised text classification using doc2vec and label spreading

·2 mins

Here is a simple way to classify text without much human effort and get a impressive performance.

It can be divided into two steps:

  1. Get train data by using keyword classification
  2. Generate a more accurate classification model by using doc2vec and label spreading

Keyword-based Classification #

Keyword based classification is a simple but effective method. Extracting the target keyword is a monotonous work. I use this method to automatic extract keyword candidate.

Parameters in doc2vec

·2 mins

Here are some parameter in gensim’s doc2vec class.

window #

window is the maximum distance between the predicted word and context words used for prediction within a document. It will look behind and ahead.