Introduction to Semi-supervised Learning



Table of Contents

Semi-supervised learning falls between supervised learning and unsupervised learning. It has been shown that unlabeled data, when used in conjunction with a small amount of labeled data, can considerably improve learning accuracy. Labeled data is often generated by a skilled human agent who manually classifies training examples. The cost of labeling process is sometimes unaffordable, whereas generation of unlabeled data is relatively inexpensive. In such situations, semi-supervised learning can be a better approach.

Two commonly used semi-supervised methods are Self-Training and Co-Training. Both are presented below.

Self-Training

Self-Training (also known as Bootstrapping) is probably the oldest semi-supervised learning method. The idea behind Self-Training is simple: use a seed set of labeled data to label a set of unlabeled data, which can be partly added in turn to the set of labeled data for retraining. The process may be continued until a stopping condition is reached.


procedure \(SelfTrain(L_0,U)\)

  1. \(L_0\) is seed labeled data, \(L\) is labeled data
  2. \(U\) is unlabeled data
  3. \(classifier \leftarrow train(L_0)\)
  4. \(L \leftarrow L_0\) + \(select(label(U,classifier))\)
  5. \(classifier \leftarrow train(L)\)
  6. if no stopping criterion is met, go to step 4
  7. return \(classifier\)

the above procedure shows a basic form of Self-Training. The \(train\) function is a supervised classifier that is called base leaner. It is assumed that the classifier (base learner) makes confident weighted predictions. The \(select\) function selects those instances where it is most confident.

One simple method that can be used as a stopping criterion is to repeat the steps 4-6 for a fixed, arbitrary number of rounds. Another simple method is to keep it running until convergence, i.e. until the labeled data and classifier stop changing.

Co-Training

Co-training is a semi-supervised method that requires two views of the data. In other words, it assumes that each instance is described using two different feature sets. The two feature sets are to provide different, but complementary information about the instances.


procedure \(CoTrain(L,U)\)

  1. \(L\) is labeled data
  2. \(U\) is unlabeled data
  3. \(P \leftarrow\) random selection from \(U\)
  4. \(f_1 \leftarrow train(view_1(L))\)
  5. \(f_2 \leftarrow train(view_2(L))\)
  6. \(L \leftarrow L\) + \(select(label(P,f_1))\) + \(select(label(P,f_2))\)
  7. Remove the labeled instances from \(P\)
  8. \(P \leftarrow P + \) random selection from \(U\)
  9. if no stopping criterion is met, go to step 4

the above procedure shows a basic form of Co-Training. As in Self-Training, the \(train\) function is a supervised classifier with the assumption that it makes confident weighted predictions. Also the \(select\) function selects those instances where it is most confident. The stopping criteria that were discussed for Self-Training are applicable for Co-training without modification.

The above mentioned SelfTrain and CoTrain procedures suggest strong similarities between co-training and self-training. If the classifier pair \((f_1, f_2)\) is considered a classifier with an internal structure, then Co-training can be seen as a special case of Self-Training.