Class notes for the Machine Learning Nanodegree at Udacity
Go to IndexK
centroids to the data at random positions.n
times, or until some other stop-condition has been met.The initial position of the centroids will influence the final outcome of the algorithm. See the example below:
To solve this problem, we run the algorithm multiple times and average the results.
class sklearn.cluster.KMeans(n_clusters=8, init='k-means++', n_init=10, max_iter=300,
tol=0.0001, precompute_distances='auto', verbose=0,
random_state=None, copy_x=True, n_jobs=1, algorithm='auto')
n_clusters
: number of centroids to initialize. Also defines the number of clusters to be found. This should be set using domain knowledge of the problem.max_iter
: number of iterations (associate points, move centroids, repeat) to be run.n_init
: number of times the algorithm will run before outputing the results.