Siamese Neural Networks

1. Triplet Loss Function

Starting out with three images:

  • A : anchor
  • P : positive
  • N : negative

    .. such that A and P are two different pictures of the same person while N is a picture of another person.

Given that we have a model f that produces an embedding given an image as input the triplet loss is defined as :

(defun f (input-image)
  (...));returns embedding

(defun L2-norm-squared (a b)
  (...));euclidean distance between two vectors

;;A,P,N are images as defined above

(defun triplet-loss (A P N)
  (max (+ (- (L2-norm-squared (f A) (f P))
             (L2-norm-squared (f A) (f N)))
          alpha)
       0))

Intuitively speaking, this loss encourages embedding of P and A to be closer while that of N and A to be further away. alpha is a positive hyperparameter that if increased, encourages the model to increase the distinction between the first two terms in the triplet loss. A lower alpha allows for a more lenient model.

The overall cost function is the average of all such triplet losses.

Post back-propogation post forwards on a dataset's all such triplets (note that one shot doesn't imply training on only one sample but inference with respect to a single instance), we have a Siamese neural network.

To use it, given two pictures P1 and P2, if the euclidean distance between their forwards is less than a threshold (a hyperparameter), we mark the pair as being of the same person.

It's called one shot cause you need only one picture for a specific comparison (the training phase still needed a larger dataset though).

1.1. Triplet selection

  • most triplets from a dataset wouldn't contribute much toward the objective.
  • the triplet selection strategy then becomes important, for faster convergence:
    • it is beneficial to select hard positives and hard negatives that are max and min of their respective pool of hard positives and negatives.
  • it isn't feasible to maintain these maxes and mins for all such triplets.
    • so, an approach is generate triplets offline (pause training loop) every n steps using the most recent network checkpoint and compute argmin and argmaxes on this subset
    • another online approach (part of training loop) is generating hard positives and negatives from within the current minibatch
      • do note that the minibatch here needs to be large enough for enough hard positives to be formed (FaceNet used 1000+ faces in a minibatch)
  • read more about this process in section 3.2 of https://arxiv.org/pdf/1503.03832.pdf of the FaceNet paper.
Tags::nn:ml:ai: