T-SNE

aka t-distributed stochastic neighboring embedding.

  • used to visualize relatively high dimensional data within 2-3 dimensisons.
  • can be used for semantic analysis of images, speech, text, etc when encoded into meaningful embeddings; for instance, see Text Representation.
  • falls under the umbrella of dimensionality reduction techniques. specifically, it is a non-linear dimensionality reduction technique.

0.1. Conceptual

  • on a very high level, TSNE aims to map the closeness of two points in the high dimensional space into the low dimensional space.
  • this is done by modelling the measure of closeness via a T-distribution.

0.2. Shortcomings of TSNE

  • it is a parametrized method : two of them being persplexity (somewhat akin to suggesting the number of points in a localized cluster) and epsilon (learning rate of the algorithm). One needs to test out multiple configurations and no clear relations may be observable - even over gradual changes in the hyperparameters.
  • as it's a non-linear map, one cannot take inter-cluster and intra-cluster sizes to be contenders for comparing the original distribution as local densities are brought up to the same level and there is no definitive uniform global transformation as in linear dimensionality reduction techniques.
  • randomness (noise) may be inferred as a pattern that isn't inherently present - need to be wary of such emergences.
  • shapes and patterns may represent the original distribution only within small ranges of perplexities. They may, however, always represent the overall layout of the original distribution : eg - two parallel lines may be mapped as slightly warped, not so parallel curvy lines that still look somewhat similar. Again, this is a result of the non-linear nature of the map.

1. References

Tags::math: