Ensemble Algorithms
- instead of learning one super-accurate model, focus on training a large number of low-accuracy models (voters) and then comibining predictions by these weaker models to obtain a high accuracy meta-model
- low-accuracy models are generally modelled by weak learning algorithms : they're faster to train and during inference.
- each does at least slightly better than random guessing then one can obtain a high accuracy model by combining a large number of such weaker models.
- weighted voting is a possibility
1. Paradigms
1.1. Bagging
- using multiple weak models (trained on slightly tweaked training data)
2. Some commonly used algorithms:
2.2. Gradient Boosting
- essentially stacking models instead of using them independently
- a rough outline of the procedure (for Regression):
- build a model (say f0) and generate predictions
- identify residuals (difference between predictions and actuals) for each feature vector
- train a new model(say f1) for this set of feature vectors and residuals
- set the new predictor as (f0 + alpha*f1)
- alpha is a hyperparameter (learning rate)
- repeat with this new predictor as the new f0 until residuals have been reduced satisfactorily
- boosting reduces bias (helps target underfitting issues) rather than work on variance (bagging helps with that)
- boosting is prone to overfitting
- this can be controlled by tuning the base models power (depth in trees, degree of polynomial in linear regression, etc) and the number of subsequent iterations
- read more here: https://en.wikipedia.org/wiki/Gradient_boosting
- classification can be adapted to gradient boosting by modelling it as a logistic regression problem.
- the modifiable predictor will be the exponent of the natural number in this case, before we wrap it with a sigmoid
- more powerful than random forests and slower due to the sequential nature of building the model
- consider reading more: https://en.wikipedia.org/wiki/Gradient_boosting
Tags::ml:ai: