Feature Engineering

  • preparing the dataset to be used for the learning algorithm
  • the goal is to convert the data in to features with high-predictive power and make them usable in the first place

Some common feature engineering processes are:

1. One Hot Encoding

  • converting categoricals into separate booleans

2. Entity Embeddings

3. Binning (Bucketting)

  • converting a continuous feature into multiple exclusive boolean buckets (based on value ranges)
    • 0 to 10, 10 to 20, and so on… , for instance.

4. Normalization

  • converting varying numerical ranges into a standard (-1 to 1 or 0 to 1).
  • aids learning algorithms computationally (avoid precision and overflow discrepancies)
(defun normalize (numerical-data-vector)
  (let* ((min (minimum numerical-data-vector))
         (max (maximum numerical-data-vector))
         (span (- max min)))
    (mapcar #'(lambda (feature)
                (/ (- feature min)
                   span))
            numerical-data-vector)))

5. Standardization

  • aka z-score normalization
  • rescaling features so that they have the properties of a standard normal distribution (zero mean, unit variance)
(defun standardize (numerical-data-vector)
  (let* ((mu (mean numerical-data-vector))
         (sigma (sqrt (variance numerical-data-vector))))
    (mapcar #'(lambda (feature)
                (/ (- feature mu)
                   sigma))
            numerical-data-vector)))

6. Dealing with Missing Features

Possible approaches:

  • removing examples with missing features
  • using a learning algorithm that can deal with missing data
  • data imputation techniques

7. Data Imputation Techniques

  • replace by mean, median or other similar statistic
  • something outside the normal range to indicate imputation (-1 in a normal 2-5 range for instance)
  • something according to the range and not a statistic (0 for -1 to 1 for instance)

A more advanced approach is modelling the imputation as a regression problem before proceeding with the actual task. In this case all the other features are used to predict the missing feature.

In cases of a large dataset, one can introduce an extra indicator feature to signify missing data and then place a value of choice.

  • test more than 1 technique and proceed with what suits best
Tags::ml:ai: