Siamese Neural Networks

1. Triplet Loss Function
- 1.1. Triplet selection

useful in the case of Few Shot Learning.
can be built by any kind of neural network : the difference arises in the way we engineer the loss function
specifically, we use a .. (specifying into Computer Vision applications)

1. Triplet Loss Function

Starting out with three images:

A : anchor
P : positive
N : negative

.. such that A and P are two different pictures of the same person while N is a picture of another person.

Given that we have a model f that produces an embedding given an image as input the triplet loss is defined as :

(defun f (input-image)
  (...));returns embedding

(defun L2-norm-squared (a b)
  (...));euclidean distance between two vectors

;;A,P,N are images as defined above

(defun triplet-loss (A P N)
  (max (+ (- (L2-norm-squared (f A) (f P))
             (L2-norm-squared (f A) (f N)))
          alpha)
       0))

Intuitively speaking, this loss encourages embedding of P and A to be closer while that of N and A to be further away. alpha is a positive hyperparameter that if increased, encourages the model to increase the distinction between the first two terms in the triplet loss. A lower alpha allows for a more lenient model.

The overall cost function is the average of all such triplet losses.

Post back-propogation post forwards on a dataset's all such triplets (note that one shot doesn't imply training on only one sample but inference with respect to a single instance), we have a Siamese neural network.

To use it, given two pictures P1 and P2, if the euclidean distance between their forwards is less than a threshold (a hyperparameter), we mark the pair as being of the same person.

It's called one shot cause you need only one picture for a specific comparison (the training phase still needed a larger dataset though).

1.1. Triplet selection

most triplets from a dataset wouldn't contribute much toward the objective.
the triplet selection strategy then becomes important, for faster convergence:
- it is beneficial to select hard positives and hard negatives that are max and min of their respective pool of hard positives and negatives.
it isn't feasible to maintain these maxes and mins for all such triplets.
- so, an approach is generate triplets offline (pause training loop) every n steps using the most recent network checkpoint and compute argmin and argmaxes on this subset
- another online approach (part of training loop) is generating hard positives and negatives from within the current minibatch
  - do note that the minibatch here needs to be large enough for enough hard positives to be formed (FaceNet used 1000+ faces in a minibatch)
read more about this process in section 3.2 of https://arxiv.org/pdf/1503.03832.pdf of the FaceNet paper.

Siamese Neural Networks

Table of Contents

1. Triplet Loss Function

1.1. Triplet selection