machine-learning-nanodegree

Class notes for the Machine Learning Nanodegree at Udacity

Go to Index

Classification Learning

Terms

Decision Trees: Learning

Decision Trees algorithms are as following:

  1. Pick the attribute that splits the data the best.
  2. Ask the question about the attribute.
  3. Follow the answer path
  4. Go to 1, until the answer is found.

ID3

A top down learning algorithm.

ID3 Algorithm

Bias of ID3

Decision Trees: Other considerations

Dealing with overfitting

For continuous

Pruning

Pruning is a technique in machine learning that reduces the size of decision trees by removing sections of the tree that provide little power to classify instances. Pruning reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting.

Decision Trees: Wrap up

More decision trees

Decision tree graph representation

Graph representation

Decision Trees Accuracy

from sklearn import tree
clf = tree.DecisionTreeClassifier(min_samples_split=10)
clf = clf.fit(features_train, labels_train)

from sklearn.metrics import accuracy_score
acc = accuracy_score(labels_test, clf.predict(features_test)) 

Entropy

Entropy

Calculating Entropy Example

Suppose we have a sample like: S S F F, where S is slow and F is fast. From this sample, we can infer that pi of slow is 0.5

So the entropy could be calculated like:

import math
entropy = 2*( (-0.5) * math.log(0.5, 2)) # resulting in 1

Information Gain

Information Gain

sklearn.tree DecisionTreeClassifier default criterion

Scikit learn uses Gini impurity as default criterion for creating Decision Trees Classifiers. To use the Entropy criterion instead, one should do the following:

from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(criterion='entropy')