machine-learning-nanodegree

Class notes for the Machine Learning Nanodegree at Udacity

Go to Index

Feature Engineering

Feature Scaling

Feature Scaling Formula

Min Max Rescaler Coding Quiz

""" quiz materials for feature scaling clustering """

### FYI, the most straightforward implementation might 
### throw a divide-by-zero error, if the min and max
### values are the same
### but think about this for a second--that means that every
### data point has the same value for that feature!  
### why would you rescale it?  Or even use it at all?
def featureScaling(arr):
    min_x = min(arr)
    value_range = max(arr) - min_x
    
    if value_range == 0:
        return [1 for x in arr]
    
    return [float(x - min_x) / value_range for x in arr]

# tests of your feature scaler--line below is input data
data = [115, 140, 175]
print featureScaling(data)

Min Max Rescaler in sklearn

Min Max Rescaler in sklearn

Quiz: Algorithms affected by feature rescaling

Answer: SVM and K-means are affected by feature rescaling. For instance, take the feature rescaling example where weight and height were rescaled, so that their contributions to the outcome would be the same (i.e: between 0 and 1). Because SVM and K-means compute distances, scaling the features would affect the calculated distances and therefore would affect the result.

In contrast, Decision Trees and Linear Regression don’t measure distances. Decision Trees define for each feature available some constant value in order to split the data. So if we scale that feature, we will be scaling the constant split value by the same amount and the result won’t be changed. In a similar way, Linear Regression defines coefficients for each feature available, so they are independent from each other and rescaling the features won’t change the results

Feature Selection

Why Feature Selection?

How hard is feature selection?

Feature selection is defined as:

F(N) -> M, where M ≤ N

To select all M relevant features from N, without knowing M, we need to try all subsets of N. This give us n choose m combinations, which, if we don’t know m equals 2n possibilities.

Filtering and Wrapping Overview

Filtering and Wrapping Overview

Filtering and Wrapping Comparison

Filtering and Wrapping Comparison

Filtering and Wrapping Comparison 2

Minimum Features Quiz

Minimum Features Quiz

Feature Relevance

Feature Relevance

  1. A feature xi is strongly relevant if removing it degrades the Bayes Optimal Classifier (BOC)
  2. A feature xi is weakly relevant if:
    • it is not strongly relevant
    • exists a subset of features S that by adding xi to S, improves BOC
  3. otherwise the feature xi is irrelevant

Feature Relevance vs Usefulness

Feature Relevance