Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
An efficient k-means clustering algorithm: analysis and implementation
5.620
Zitationen
6
Autoren
2002
Jahr
Abstract
In k-means clustering, we are given a set of n data points in d-dimensional space R/sup d/ and an integer k and the problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's (1982) algorithm. We present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which shows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation.
Ähnliche Arbeiten
Visualizing Data using t-SNE
2008 · 35.711 Zit.
Data mining: concepts and techniques
2012 · 28.872 Zit.
Silhouettes: A graphical aid to the interpretation and validation of cluster analysis
1987 · 20.269 Zit.
A density-based algorithm for discovering clusters in large spatial Databases with Noise
1996 · 19.133 Zit.
The WEKA data mining software
2009 · 17.823 Zit.