Ladder Network

The overall loss function is built up of several individual components:

for each corresponding layer pair of encoder(from start) and decoder(from end), one cost expression penalizes the squared difference between these (Euclidean distance).
one component for the actual label prediction from the softmaxed bottleneck using a negative log-likelihood cost function.
the final combination is a linear combination of the above with the coefficients of the layer components being hyperparameters.
- these hyperparameters help control the tradeoff between the classification and the encoding-decoding quality.
In addition to jittering the input with guassian noise, each intermediate layer of the encoder is corrupted with guassian noise as well.
- this is only done during training and not when inferring a feature vector's prediction