Active Learning

An important Supervised Learning paradigm employed when obtaining labeled examples directly is costly.

1. Strategy

Out of the multiple strategies of active learning, exploring 2

1.1. Data density and uncertainty based

  1. post obtaining a current model trained on the labelled examples..
  2. for each unlabeled feature vector x, generate a confidence score as follows:
    • density(x)*uncertainty(x)
    • density reflects how many examples surround x in its close neighborhood
      • can be obtained by averaging distance from the k nearest neighbors (k being a hyperparameter).
    • uncertainty reflects how uncertain the prediction of the current model is for x. (obtained by prediction probabilities of x in case of classification)
  3. Once importances are obtained for all unlabeled feature vectors, a domain expert is asked to annotate the ones with the highest importances.
  4. add these newly annotated examples to the training set
  5. until a stopping criterion (max requests per annotation session for instance) has been achieved continue updating training set and finally build the model.
  6. repeat the cycle according to desired frequencies

1.2. Support vector based

  1. build an SVM model using labelled data
  2. ask the expert to annotate unlabeled examples that lie closest to the hyperplane separating the two classes (binary classifiation)
Tags::ml:ai: