machine-learning-nanodegree

Class notes for the Machine Learning Nanodegree at Udacity

Go to Index

Clustering

K-Means

K-Means Algorithm

  1. Add K centroids to the data at random positions.

Add centroids

  1. Associate each data point to the closest centroid (aka association step)

Associate step

  1. Move the centroids to the mean distance between all associated points

Move centroids

  1. Repeat step 2 and 3 n times, or until some other stop-condition has been met.

K-Means is not deterministic

The initial position of the centroids will influence the final outcome of the algorithm. See the example below:

uniform 1

uniform 2

To solve this problem, we run the algorithm multiple times and average the results.

K-Means and sklearn

class sklearn.cluster.KMeans(n_clusters=8, init='k-means++', n_init=10, max_iter=300, 
                             tol=0.0001, precompute_distances='auto', verbose=0, 
                             random_state=None, copy_x=True, n_jobs=1, algorithm='auto')

K-means references

Single Linkage Clustering

Single Linkage Clustering Algorithm

Single Linkage Clustering

Soft Clustering

soft clustering

Expectation Maximization

Expectation Maximization formula

Expectation Maximization Properties

Expectation Maximization Properties

Clustering Properties

Clustering Properties

Impossibility Theorem

Impossibility Theorem

Summary

Summary