Feature Engineering

1. One Hot Encoding
2. Entity Embeddings
3. Binning (Bucketting)
4. Normalization
5. Standardization
6. Dealing with Missing Features
7. Data Imputation Techniques

preparing the dataset to be used for the learning algorithm
the goal is to convert the data in to features with high-predictive power and make them usable in the first place

Some common feature engineering processes are:

1. One Hot Encoding

converting categoricals into separate booleans

2. Entity Embeddings

one hot encodings might not be the best option when some relevances do exist between the categorical variables being considered.
see https://arxiv.org/pdf/1604.06737.pdf

3. Binning (Bucketting)

converting a continuous feature into multiple exclusive boolean buckets (based on value ranges)
- 0 to 10, 10 to 20, and so on… , for instance.

4. Normalization

converting varying numerical ranges into a standard (-1 to 1 or 0 to 1).
aids learning algorithms computationally (avoid precision and overflow discrepancies)

(defun normalize (numerical-data-vector)
  (let* ((min (minimum numerical-data-vector))
         (max (maximum numerical-data-vector))
         (span (- max min)))
    (mapcar #'(lambda (feature)
                (/ (- feature min)
                   span))
            numerical-data-vector)))

5. Standardization

aka z-score normalization
rescaling features so that they have the properties of a standard normal distribution (zero mean, unit variance)

(defun standardize (numerical-data-vector)
  (let* ((mu (mean numerical-data-vector))
         (sigma (sqrt (variance numerical-data-vector))))
    (mapcar #'(lambda (feature)
                (/ (- feature mu)
                   sigma))
            numerical-data-vector)))

6. Dealing with Missing Features

Possible approaches:

removing examples with missing features
using a learning algorithm that can deal with missing data
data imputation techniques

7. Data Imputation Techniques

replace by mean, median or other similar statistic
something outside the normal range to indicate imputation (-1 in a normal 2-5 range for instance)
something according to the range and not a statistic (0 for -1 to 1 for instance)

A more advanced approach is modelling the imputation as a regression problem before proceeding with the actual task. In this case all the other features are used to predict the missing feature.

In cases of a large dataset, one can introduce an extra indicator feature to signify missing data and then place a value of choice.

test more than 1 technique and proceed with what suits best