青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品

O(1) 的小樂

Job Hunting

公告

記錄我的生活和工作。。。
<2010年10月>
262728293012
3456789
10111213141516
17181920212223
24252627282930
31123456

統(tǒng)計

  • 隨筆 - 182
  • 文章 - 1
  • 評論 - 41
  • 引用 - 0

留言簿(10)

隨筆分類(70)

隨筆檔案(182)

文章檔案(1)

如影隨形

搜索

  •  

最新隨筆

最新評論

閱讀排行榜

評論排行榜

k-means clustering

      In statistics and machine learning, k-means clustering is a method of cluster analysis which aims topartition n observations into k clusters in which each observation belongs to the cluster with the nearestmean. It is similar to the expectation-maximization algorithm for mixtures of Gaussians in that they both attempt to find the centers of natural clusters in the data as well as in the iterative refinement approach employed by both algorithms.

 

Description

Given a set of observations (x1, x2, …, xn), where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k sets (k < n) S = {S1, S2, …, Sk} so as to minimize the within-cluster sum of squares (WCSS):

\underset{\mathbf{S}} \operatorname{arg\,min} \sum_{i=1}^{k} \sum_{\mathbf x_j \in S_i} \left\| \mathbf x_j - \boldsymbol\mu_i \right\|^2

where μi is the mean of points in Si.

 

Algorithms

Regarding computational complexity, the k-means clustering problem is:

  • NP-hard in general Euclidean space d even for 2 clusters [4][5]
  • NP-hard for a general number of clusters k even in the plane [6]
  • If k and d are fixed, the problem can be exactly solved in time O(ndk+1 log n), where n is the number of entities to be clustered [7]

Thus, a variety of heuristic algorithms are generally used.

 

所以注意到Algorithm是一個典型的NP問題,所以通常我們尋找使用的是啟發(fā)式方法。

Standard algorithm

The most common algorithm uses an iterative refinement technique.最常用的一個技巧是迭代求精。

Due to its ubiquity it is often called the k-means algorithm; it is also referred to as , particularly in the computer science community.

Given an initial set of k means m1(1),…,mk(1), which may be specified randomly or by some heuristic, the algorithm proceeds by alternating between two steps:[8]

Assignment step: Assign each observation to the cluster with the closest mean (i.e. partition the observations according to the Voronoi diagram generated by the means(這里等價于把原空間根據(jù)Voronoi 圖劃分為k個,此處的范數(shù)指的是2范數(shù),即歐幾里得距離,和Voronoi圖對應)).
S_i^{(t)} = \left\{ \mathbf x_j : \big\| \mathbf x_j - \mathbf m^{(t)}_i \big\| \leq \big\| \mathbf x_j - \mathbf m^{(t)}_{i^*} \big\| \text{ for all }i^*=1,\ldots,k \right\}
 
Update step: Calculate the new means to be the centroid of the observations in the cluster.
\mathbf m^{(t+1)}_i = \frac{1}{|S^{(t)}_i|} \sum_{\mathbf x_j \in S^{(t)}_i} \mathbf x_j
重新計算means

The algorithm is deemed to have converged when the assignments no longer change.

 

整個算法的流程就是如上圖所示

 

As it is a heuristic algorithm, there is no guarantee that it will converge to the global optimum, and the result may depend on the initial clusters. As the algorithm is usually very fast, it is common to run it multiple times with different starting conditions. However, in the worst case, k-means can be very slow to converge: in particular it has been shown that there exist certain point sets, even in 2 dimensions, on whichk-means takes exponential time, that is 2Ω(n), to converge[9][10]. These point sets do not seem to arise in practice: this is corroborated by the fact that the smoothed running time of k-means is polynomial[11].

最壞的時間復雜度是O(2Ω(n)),但是在實踐中,一般表現(xiàn)是一個多項式算法。

The "assignment" step is also referred to as expectation step, the "update step" as maximization step, making this algorithm a variant of the generalized expectation-maximization algorithm.

Variations

  • The expectation-maximization algorithm (EM algorithm) maintains probabilistic assignments to clusters, instead of deterministic assignments, and multivariate Gaussian distributions instead of means.
  • k-means++ seeks to choose better starting clusters.
  • The filtering algorithm uses kd-trees to speed up each k-means step.[12]
  • Some methods attempt to speed up each k-means step using coresets[13] or the triangle inequality.[14]
  • Escape local optima by swapping points between clusters.[15]

Discussion

File:Iris Flowers Clustering kMeans.svg

k-means clustering result for the Iris flower data set and actual species visualized using ELKI. Cluster means are marked using larger, semi-transparent symbols.

File:ClusterAnalysis Mouse.svg

k-means clustering and EM clustering on an artificial dataset ("mouse"). The tendency of k-means to produce equi-sized clusters leads to bad results, while EM benefits from the Gaussian distribution present in the data set

The two key features of k-means which make it efficient are often regarded as its biggest drawbacks:

A key limitation of k-means is its cluster model. The concept is based on spherical clusters that are separable in a way so that the mean value converges towards the cluster center. The clusters are expected to be of similar size, so that the assignment to the nearest cluster center is the correct assignment. When for example applying k-means with a value of k = 3 onto the well-known Iris flower data set, the result often fails to separate the three Iris species contained in the data set. With k = 2, the two visible clusters (one containing two species) will be discovered, whereas withk = 3 one of the two clusters will be split into two even parts. In fact, k = 2 is more appropriate for this data set, despite the data set containing 3 classes. As with any other clustering algorithm, the k-means result relies on the data set to satisfy the assumptions made by the clustering algorithms. It works very well on some data sets, while failing miserably on others.

The result of k-means can also be seen as the Voronoi cells of the cluster means. Since data is split halfway between cluster means, this can lead to suboptimal splits as can be seen in the "mouse" example. The Gaussian models used by the Expectation-maximization algorithm (which can be seen as a generalization of k-means) are more flexible here by having both variances and covariances. The EM result is thus able to accommodate clusters of variable size much better than k-means as well as correlated clusters (not in this example).

 

這篇是概念介紹篇,以后出代碼和一個K均值優(yōu)化的論文

Fast Hierarchical Clustering Algorithm Using Locality-Sensitive Hashing

posted on 2010-10-19 18:57 Sosi 閱讀(1606) 評論(0)  編輯 收藏 引用 所屬分類: Courses

統(tǒng)計系統(tǒng)
青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品
  • <ins id="pjuwb"></ins>
    <blockquote id="pjuwb"><pre id="pjuwb"></pre></blockquote>
    <noscript id="pjuwb"></noscript>
          <sup id="pjuwb"><pre id="pjuwb"></pre></sup>
            <dd id="pjuwb"></dd>
            <abbr id="pjuwb"></abbr>
            亚洲一区二区日本| 欧美成人影音| 亚洲视频电影图片偷拍一区| 欧美日韩亚洲不卡| 亚洲综合久久久久| 性感少妇一区| 永久免费精品影视网站| 亚洲国产精品久久久久秋霞不卡 | 亚洲免费播放| 一区二区成人精品| 国产亚洲精品一区二555| 免费欧美在线| 欧美日韩一卡二卡| 久久久777| 久久久亚洲午夜电影| 久久亚洲国产成人| 一本久道久久综合婷婷鲸鱼| 亚洲深爱激情| 亚洲精品麻豆| 欧美亚洲网站| 夜夜嗨av一区二区三区网站四季av| 亚洲男人影院| 99精品视频免费| 欧美在线电影| 亚洲特级毛片| 久久免费视频在线| 校园春色国产精品| 欧美韩日一区| 老色鬼久久亚洲一区二区| 欧美日韩黄视频| 久久综合一区| 国产麻豆午夜三级精品| 最新高清无码专区| 狠狠综合久久av一区二区老牛| 亚洲激情影视| 亚洲高清视频一区二区| a4yy欧美一区二区三区| 久久久久久久久久久久久女国产乱 | 亚洲韩国一区二区三区| 国产亚洲一区二区三区在线播放| 91久久精品日日躁夜夜躁国产| 国产午夜精品一区理论片飘花 | 免费成人av在线| 久久国产一二区| 国产精品久久久久久亚洲调教 | 美女视频一区免费观看| 国产精品一级在线| 99在线观看免费视频精品观看| 亚洲国产精品久久精品怡红院| 欧美一区二区啪啪| 午夜亚洲性色视频| 国产精品久久国产三级国电话系列| 亚洲国产女人aaa毛片在线| 国内综合精品午夜久久资源| 亚洲一区二区四区| 亚洲欧美日韩天堂| 国产精品高潮呻吟久久av黑人| 亚洲日本va午夜在线电影| 亚洲国产一区二区a毛片| 久久九九热免费视频| 久久视频一区二区| 狠狠色狠狠色综合日日tαg| 久久精品观看| 牛牛国产精品| 亚洲国产天堂久久综合| 农村妇女精品| 亚洲人精品午夜| 这里只有精品电影| 国产精品二区在线| 亚洲女爱视频在线| 久久亚洲美女| 亚洲精品久久| 欧美视频在线观看| 亚洲在线成人精品| 久久久久网址| 亚洲国产精品小视频| 欧美激情一区二区在线| 999在线观看精品免费不卡网站| 亚洲图片欧美日产| 国产偷国产偷精品高清尤物| 久久久久久综合| 亚洲国产天堂久久综合| 亚洲一区二区精品视频| 国产伦精品一区二区三区视频孕妇| 欧美一区观看| 亚洲国产日韩一区二区| 欧美www视频在线观看| 亚洲裸体视频| 久久精品一区二区三区中文字幕 | 中国成人亚色综合网站| 国产精品视频免费在线观看| 久久精品在线免费观看| 亚洲品质自拍| 欧美中文字幕久久| 亚洲人成网站999久久久综合| 欧美视频在线一区| 久久精品女人的天堂av| 夜夜爽99久久国产综合精品女不卡| 欧美在线播放一区| 亚洲毛片视频| 国产中文一区二区| 欧美人与性动交a欧美精品| 欧美一级电影久久| 亚洲精品小视频| 久久人人97超碰国产公开结果| 一本色道久久综合狠狠躁篇的优点 | 久久久水蜜桃| 亚洲性感美女99在线| 免费亚洲电影在线观看| 亚洲欧美成人在线| 亚洲美女性视频| 国产在线观看91精品一区| 欧美三级欧美一级| 女主播福利一区| 欧美在线free| 亚洲欧美日韩国产综合在线 | 99精品热视频| 亚洲国产精品高清久久久| 久久精品一区四区| 午夜免费在线观看精品视频| 亚洲日本电影在线| 影音欧美亚洲| 国产一区二区精品久久99| 国产精品久久国产愉拍| 欧美日韩成人激情| 欧美成人乱码一区二区三区| 久久精品日韩欧美| 午夜精品久久久久久| 亚洲一区二区精品在线| 最新亚洲视频| 影音先锋久久精品| 国语自产精品视频在线看一大j8 | 国产精品美女诱惑| 国产精品久久久爽爽爽麻豆色哟哟| 欧美理论视频| 欧美日韩精品欧美日韩精品一| 欧美国产精品人人做人人爱| 欧美成年人网站| 欧美成人精品1314www| 能在线观看的日韩av| 久久这里有精品视频| 久久亚洲欧美| 免费在线观看精品| 你懂的一区二区| 欧美极品色图| 欧美视频在线一区| 国产精品午夜视频| 国产日韩欧美一区二区| 国产日韩欧美在线一区| 国产亚洲精品自拍| 黄色成人在线网站| 亚洲国产日韩综合一区| 亚洲精品在线观看视频| 亚洲美女在线国产| 亚洲在线中文字幕| 久久精品91久久久久久再现| 久久综合给合久久狠狠狠97色69| 卡一卡二国产精品| 亚洲国产精品久久久久秋霞不卡| 亚洲精品免费一区二区三区| 一区二区三区视频在线看| 亚洲一区二区视频在线| 久久精品欧洲| 欧美成人午夜激情在线| 国产精品第十页| 国产一区二区三区的电影 | 在线免费观看日本一区| 99精品视频免费全部在线| 亚洲欧美在线免费| 久久亚洲影音av资源网| 亚洲精品久久嫩草网站秘色| 亚洲一区二区三区在线| 久久资源在线| 国产精品高清在线| 永久域名在线精品| 亚洲视频国产视频| 久久一区欧美| 亚洲免费高清视频| 久久人人97超碰精品888| 欧美日韩国产三区| 揄拍成人国产精品视频| 亚洲一区二区三区四区五区黄| 久久免费视频一区| 一区二区三区 在线观看视| 久久精品亚洲一区二区| 欧美日韩一级视频| 亚洲国产日本| 久久精品一二三| 中文一区字幕| 欧美日本成人| 亚洲国产成人久久综合一区| 性视频1819p久久| 亚洲精品婷婷| 免费在线欧美视频| 国产一区二区三区高清在线观看| 一本色道综合亚洲| 亚洲盗摄视频| 久久理论片午夜琪琪电影网| 亚洲三级影院| 欧美不卡在线视频|