Multi-Label Classification

each feature vector has multiple prediction labels associated with it.
- classification isn't exclusive, it's about identifying the applicable labels than comparing them.
this can be targeted using the one-verses-rest strategy.
- with a new threshold hyperparameter
  - if the probability for a class exceeds that threshold, we add this label to the overall label set for that feature vector
- repeated for each label out of all possible labels
Neural Networks can target this naturally by using a binary cross entropy loss function for each possible label
- the loss then is the average for all these losses for each label associated with a sigmoid activation in the output layer.

In some cases, it might be possible to separate the label into cartesian products of mutually exclusively applicable labels

Combination number	label: set 1	label: set 2	actual label set
1	photo	portrait	{"photo","portrait"}
2	photo	paysage	{"photo","paysage"}
3	photo	other	{"photo","other"}
4	painting	portrait	{"painting","portrait"}
5	painting	paysage	{"painting","paysage"}
6	painting	other	{"painting","other"}