青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品

O(1) 的小樂

Job Hunting

公告

記錄我的生活和工作。。。
<2010年10月>
262728293012
3456789
10111213141516
17181920212223
24252627282930
31123456

統(tǒng)計

  • 隨筆 - 182
  • 文章 - 1
  • 評論 - 41
  • 引用 - 0

留言簿(10)

隨筆分類(70)

隨筆檔案(182)

文章檔案(1)

如影隨形

搜索

  •  

最新隨筆

最新評論

閱讀排行榜

評論排行榜

k-means clustering

      In statistics and machine learning, k-means clustering is a method of cluster analysis which aims topartition n observations into k clusters in which each observation belongs to the cluster with the nearestmean. It is similar to the expectation-maximization algorithm for mixtures of Gaussians in that they both attempt to find the centers of natural clusters in the data as well as in the iterative refinement approach employed by both algorithms.

 

Description

Given a set of observations (x1, x2, …, xn), where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k sets (k < n) S = {S1, S2, …, Sk} so as to minimize the within-cluster sum of squares (WCSS):

\underset{\mathbf{S}} \operatorname{arg\,min} \sum_{i=1}^{k} \sum_{\mathbf x_j \in S_i} \left\| \mathbf x_j - \boldsymbol\mu_i \right\|^2

where μi is the mean of points in Si.

 

Algorithms

Regarding computational complexity, the k-means clustering problem is:

  • NP-hard in general Euclidean space d even for 2 clusters [4][5]
  • NP-hard for a general number of clusters k even in the plane [6]
  • If k and d are fixed, the problem can be exactly solved in time O(ndk+1 log n), where n is the number of entities to be clustered [7]

Thus, a variety of heuristic algorithms are generally used.

 

所以注意到Algorithm是一個典型的NP問題,所以通常我們尋找使用的是啟發(fā)式方法。

Standard algorithm

The most common algorithm uses an iterative refinement technique.最常用的一個技巧是迭代求精。

Due to its ubiquity it is often called the k-means algorithm; it is also referred to as , particularly in the computer science community.

Given an initial set of k means m1(1),…,mk(1), which may be specified randomly or by some heuristic, the algorithm proceeds by alternating between two steps:[8]

Assignment step: Assign each observation to the cluster with the closest mean (i.e. partition the observations according to the Voronoi diagram generated by the means(這里等價于把原空間根據(jù)Voronoi 圖劃分為k個,此處的范數(shù)指的是2范數(shù),即歐幾里得距離,和Voronoi圖對應(yīng))).
S_i^{(t)} = \left\{ \mathbf x_j : \big\| \mathbf x_j - \mathbf m^{(t)}_i \big\| \leq \big\| \mathbf x_j - \mathbf m^{(t)}_{i^*} \big\| \text{ for all }i^*=1,\ldots,k \right\}
 
Update step: Calculate the new means to be the centroid of the observations in the cluster.
\mathbf m^{(t+1)}_i = \frac{1}{|S^{(t)}_i|} \sum_{\mathbf x_j \in S^{(t)}_i} \mathbf x_j
重新計算means

The algorithm is deemed to have converged when the assignments no longer change.

 

整個算法的流程就是如上圖所示

 

As it is a heuristic algorithm, there is no guarantee that it will converge to the global optimum, and the result may depend on the initial clusters. As the algorithm is usually very fast, it is common to run it multiple times with different starting conditions. However, in the worst case, k-means can be very slow to converge: in particular it has been shown that there exist certain point sets, even in 2 dimensions, on whichk-means takes exponential time, that is 2Ω(n), to converge[9][10]. These point sets do not seem to arise in practice: this is corroborated by the fact that the smoothed running time of k-means is polynomial[11].

最壞的時間復(fù)雜度是O(2Ω(n)),但是在實踐中,一般表現(xiàn)是一個多項式算法。

The "assignment" step is also referred to as expectation step, the "update step" as maximization step, making this algorithm a variant of the generalized expectation-maximization algorithm.

Variations

  • The expectation-maximization algorithm (EM algorithm) maintains probabilistic assignments to clusters, instead of deterministic assignments, and multivariate Gaussian distributions instead of means.
  • k-means++ seeks to choose better starting clusters.
  • The filtering algorithm uses kd-trees to speed up each k-means step.[12]
  • Some methods attempt to speed up each k-means step using coresets[13] or the triangle inequality.[14]
  • Escape local optima by swapping points between clusters.[15]

Discussion

File:Iris Flowers Clustering kMeans.svg

k-means clustering result for the Iris flower data set and actual species visualized using ELKI. Cluster means are marked using larger, semi-transparent symbols.

File:ClusterAnalysis Mouse.svg

k-means clustering and EM clustering on an artificial dataset ("mouse"). The tendency of k-means to produce equi-sized clusters leads to bad results, while EM benefits from the Gaussian distribution present in the data set

The two key features of k-means which make it efficient are often regarded as its biggest drawbacks:

A key limitation of k-means is its cluster model. The concept is based on spherical clusters that are separable in a way so that the mean value converges towards the cluster center. The clusters are expected to be of similar size, so that the assignment to the nearest cluster center is the correct assignment. When for example applying k-means with a value of k = 3 onto the well-known Iris flower data set, the result often fails to separate the three Iris species contained in the data set. With k = 2, the two visible clusters (one containing two species) will be discovered, whereas withk = 3 one of the two clusters will be split into two even parts. In fact, k = 2 is more appropriate for this data set, despite the data set containing 3 classes. As with any other clustering algorithm, the k-means result relies on the data set to satisfy the assumptions made by the clustering algorithms. It works very well on some data sets, while failing miserably on others.

The result of k-means can also be seen as the Voronoi cells of the cluster means. Since data is split halfway between cluster means, this can lead to suboptimal splits as can be seen in the "mouse" example. The Gaussian models used by the Expectation-maximization algorithm (which can be seen as a generalization of k-means) are more flexible here by having both variances and covariances. The EM result is thus able to accommodate clusters of variable size much better than k-means as well as correlated clusters (not in this example).

 

這篇是概念介紹篇,以后出代碼和一個K均值優(yōu)化的論文

Fast Hierarchical Clustering Algorithm Using Locality-Sensitive Hashing

posted on 2010-10-19 18:57 Sosi 閱讀(1635) 評論(0)  編輯 收藏 引用 所屬分類: Courses

統(tǒng)計系統(tǒng)
青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品
  • <ins id="pjuwb"></ins>
    <blockquote id="pjuwb"><pre id="pjuwb"></pre></blockquote>
    <noscript id="pjuwb"></noscript>
          <sup id="pjuwb"><pre id="pjuwb"></pre></sup>
            <dd id="pjuwb"></dd>
            <abbr id="pjuwb"></abbr>
            午夜激情一区| 久久亚洲一区二区三区四区| 亚洲二区视频| 另类欧美日韩国产在线| 亚洲国产精品久久久久婷婷884| 麻豆av福利av久久av| 免费人成精品欧美精品| 日韩午夜在线| 在线视频欧美日韩| 国产亚洲综合在线| 免费在线成人av| 欧美黑人国产人伦爽爽爽| 亚洲一区日韩在线| 欧美一区二区大片| 亚洲欧洲在线观看| 一区二区电影免费观看| 国产丝袜一区二区三区| 免费人成精品欧美精品| 欧美猛交免费看| 欧美在线免费| 免费久久99精品国产自| 亚洲一区激情| 久久国产婷婷国产香蕉| 99热免费精品| 久久成人综合视频| 一区二区三区精密机械公司| 久久成人亚洲| 一区二区三区精品国产| 午夜亚洲一区| 一区二区高清在线| 欧美一区视频| 亚洲一区二区三区激情| 久久久久久9| 亚洲欧美在线x视频| 米奇777在线欧美播放| 欧美一区二区三区成人| 欧美激情精品久久久久久变态| 久久精品成人| 国产精品yjizz| 91久久精品日日躁夜夜躁国产| 国产日韩欧美一区二区三区四区| 亚洲第一在线综合网站| 国产一区二区福利| 中文av一区二区| 亚洲精品日韩欧美| 久久久噜噜噜久久中文字免| 午夜在线观看免费一区| 欧美日韩一区二区视频在线| 欧美国产三区| 精品成人在线观看| 午夜天堂精品久久久久 | 久久久久国产精品www| 亚洲一级网站| 欧美人在线观看| 欧美激情亚洲一区| 狠狠做深爱婷婷久久综合一区| 亚洲一区二区在线| 亚洲一区二区三区四区在线观看| 欧美第一黄色网| 免费成人黄色| 影音先锋亚洲精品| 欧美在线免费观看| 久久精品一区四区| 国产欧美 在线欧美| 午夜精品视频在线观看| 香港成人在线视频| 国产精品尤物| 性伦欧美刺激片在线观看| 欧美一区三区二区在线观看| 国产乱子伦一区二区三区国色天香| 一本色道久久综合亚洲二区三区| 亚洲午夜视频在线| 国产精品免费观看视频| 亚洲免费在线观看| 久久精品国产精品亚洲| 国内揄拍国内精品久久| 久久人人精品| 最新国产成人在线观看| 亚洲视频久久| 国产日产欧产精品推荐色 | 免费亚洲视频| 亚洲黄色天堂| 亚洲一区二区视频在线| 国产精品免费视频xxxx| 欧美一级理论性理论a| 久久手机精品视频| 亚洲七七久久综合桃花剧情介绍| 欧美寡妇偷汉性猛交| 在线亚洲国产精品网站| 久久精品夜夜夜夜久久| 亚洲人成啪啪网站| 欧美日韩在线视频观看| 小处雏高清一区二区三区 | 裸体一区二区三区| 99精品国产高清一区二区| 国产精品视区| 久久久午夜电影| 亚洲精品日日夜夜| 欧美一区二区三区久久精品| 精品成人免费| 欧美先锋影音| 久久精品系列| 日韩视频一区二区三区| 久久精品一区二区| 日韩视频一区二区三区在线播放免费观看 | 国产自产2019最新不卡| 欧美不卡在线| 欧美一级在线亚洲天堂| 亚洲黄色影片| 久久亚洲电影| 亚洲在线免费视频| 亚洲国产成人精品久久久国产成人一区| 欧美激情一级片一区二区| 亚洲专区一二三| 亚洲三级视频| 久久久综合激的五月天| 一区二区三区欧美日韩| 在线看国产一区| 国产欧美一区二区精品忘忧草| 欧美暴力喷水在线| 久久狠狠一本精品综合网| 一本久道久久综合中文字幕 | 亚洲一区二区三区在线看 | 国产精品人人做人人爽| 欧美成ee人免费视频| 欧美在线视频一区| 亚洲免费综合| 一本一本久久| 亚洲另类在线一区| 欧美激情一区二区三区在线视频观看| 欧美中文在线观看| 亚洲一区二区三区免费在线观看 | 91久久精品国产91性色tv| 激情av一区二区| 国产一区观看| 国内精品模特av私拍在线观看| 国产精品捆绑调教| 欧美先锋影音| 国产精品久久久久久久久免费 | 久久激情五月激情| 欧美一区二区三区精品| 亚洲一区二区三区视频| 亚洲小少妇裸体bbw| 亚洲一区二区三区免费观看| 一区二区三区视频在线观看| 亚洲精品视频在线观看免费| 亚洲日产国产精品| 亚洲国内自拍| 99精品视频一区| 亚洲性xxxx| 欧美亚洲一区二区在线观看| 性欧美超级视频| 久久久精品999| 裸体歌舞表演一区二区| 欧美国产视频在线观看| 欧美日韩mp4| 国产精品成人一区二区艾草| 国产毛片一区二区| 国内一区二区三区| 亚洲国产一区二区a毛片| 99国产精品久久久久老师| aa级大片欧美三级| 午夜在线电影亚洲一区| 久久久久久久久久码影片| 欧美本精品男人aⅴ天堂| 亚洲激情一区| 亚洲一区在线视频| 久久露脸国产精品| 欧美日韩一区二区三区四区在线观看| 欧美三级视频在线| 国产视频一区二区三区在线观看| 樱花yy私人影院亚洲| 亚洲精品视频二区| 午夜一区在线| 亚洲国产电影| 亚洲一区免费看| 老妇喷水一区二区三区| 国产精品成人观看视频免费 | 夜夜爽99久久国产综合精品女不卡| 亚洲视频自拍偷拍| 久久午夜精品| 亚洲精品日韩在线| 久久久九九九九| 欧美日韩在线电影| 狠狠色狠狠色综合| 亚洲天堂激情| 欧美大尺度在线| 午夜精品久久久久久久久久久久久| 久久精品一区蜜桃臀影院| 欧美性一区二区| 在线观看视频一区二区欧美日韩| 亚洲一区二区在线看| 欧美二区在线看| 亚洲永久精品国产| 欧美精品亚洲二区| 亚洲国产精品久久精品怡红院| 性做久久久久久久久| 亚洲乱码国产乱码精品精天堂| 久久精品国产999大香线蕉| 国产精品久久久久久av福利软件|