public class KMeansCluster
extends java.lang.Object
The term "K-Means" Algorithm was officially introduced in 1967 by James MacQueen. It is also called Lloyd's Algorithm (Stuart Lloyd in 1957).
The idea behind the Algorithm is that partitions or clusters can be created from a given dataset based on the differences of each observation with the mean of each cluster. The logic is to minimize these squared differences .
The exact equation F that minimizes the these differences is given from the following equation:
Min(F)=ÓjÓi||xi-ìj||2
The term ||xi-ìj||refers to the difference (or distance) between the specific data point xi and the center (or mean) of cluster j as ìj
The importance of K-Means in data mining is that the ability to "group" the data is not dependent on some other specific goal (e.g. based on the propensity to be good/bad or the income etc).
| Constructor and Description |
|---|
KMeansCluster() |
| Modifier and Type | Method and Description |
|---|---|
void |
CreateKClusters(double[][] DataPoints,
int K,
int MaxIterations,
double Precision) |
java.lang.String[] |
GetClassification() |
double[][] |
Getcluster_means() |
double[][] |
GetKFuzzyClassifixationProbabilities() |
double |
GetMinimumSquaredDifference() |
public void CreateKClusters(double[][] DataPoints,
int K,
int MaxIterations,
double Precision)
DataPoints - : is the matrix of observations with x rows and n columns [x][n]K - : number of desired clustersMaxIterations - : Number of iterations to stop the algorithmPrecision - : minimum proportional change of means to stop the algorithm
This is the basic method of the KMeansCluster class that will find the optimum points for the K chosen clusters
public double GetMinimumSquaredDifference()
public double[][] Getcluster_means()
public java.lang.String[] GetClassification()
public double[][] GetKFuzzyClassifixationProbabilities()