• <ins id="pjuwb"></ins>
    <blockquote id="pjuwb"><pre id="pjuwb"></pre></blockquote>
    <noscript id="pjuwb"></noscript>
          <sup id="pjuwb"><pre id="pjuwb"></pre></sup>
            <dd id="pjuwb"></dd>
            <abbr id="pjuwb"></abbr>

            O(1) 的小樂

            Job Hunting

            公告

            記錄我的生活和工作。。。
            <2025年5月>
            27282930123
            45678910
            11121314151617
            18192021222324
            25262728293031
            1234567

            統(tǒng)計(jì)

            • 隨筆 - 182
            • 文章 - 1
            • 評(píng)論 - 41
            • 引用 - 0

            留言簿(10)

            隨筆分類(70)

            隨筆檔案(182)

            文章檔案(1)

            如影隨形

            搜索

            •  

            最新隨筆

            最新評(píng)論

            閱讀排行榜

            評(píng)論排行榜

            Mahalanobis distance 馬氏距離

              In statistics, Mahalanobis distance is a distance measure introduced by P. C. Mahalanobis in 1936.[1] It is based on correlations between variables by which different patterns can be identified and analyzed. It is a useful way of determining similarity of an unknown sample set to a known one. It differs from Euclidean distance in that it takes into account the correlations of the data set and is scale-invariant, i.e. not dependent on the scale of measurements.

             

            Formally, the Mahalanobis distance of a multivariate vector x = ( x_1, x_2, x_3, \dots, x_N )^T from a group of values with mean \mu = ( \mu_1, \mu_2, \mu_3, \dots , \mu_N )^T and covariance matrix S is defined as:

            D_M(x) = \sqrt{(x - \mu)^T S^{-1} (x-\mu)}.\, [2]

            Mahalanobis distance (or "generalized squared interpoint distance" for its squared value[3]) can also be defined as a dissimilarity measure between two random vectors  \vec{x} and  \vec{y} of the same distribution with thecovariance matrix S :

             d(\vec{x},\vec{y})=\sqrt{(\vec{x}-\vec{y})^T S^{-1} (\vec{x}-\vec{y})}.\,

            If the covariance matrix is the identity matrix, the Mahalanobis distance reduces to the Euclidean distance. If the covariance matrix is diagonal, then the resulting distance measure is called the normalized Euclidean distance:

             d(\vec{x},\vec{y})=
\sqrt{\sum_{i=1}^N  {(x_i - y_i)^2 \over \sigma_i^2}},

            where σi is the standard deviation of the xi over the sample set.

            Intuitive explanation

            Consider the problem of estimating the probability that a test point in N-dimensional Euclidean space belongs to a set, where we are given sample points that definitely belong to that set. Our first step would be to find the average or center of mass of the sample points. Intuitively, the closer the point in question is to this center of mass, the more likely it is to belong to the set.

            However, we also need to know if the set is spread out over a large range or a small range, so that we can decide whether a given distance from the center is noteworthy or not. The simplistic approach is to estimate the standard deviation of the distances of the sample points from the center of mass. If the distance between the test point and the center of mass is less than one standard deviation, then we might conclude that it is highly probable that the test point belongs to the set. The further away it is, the more likely that the test point should not be classified as belonging to the set.

            This intuitive approach can be made quantitative by defining the normalized distance between the test point and the set to be  {x - \mu} \over \sigma . By plugging this into the normal distribution we can derive the probability of the test point belonging to the set.

            The drawback of the above approach was that we assumed that the sample points are distributed about the center of mass in a spherical manner. Were the distribution to be decidedly non-spherical, for instance ellipsoidal, then we would expect the probability of the test point belonging to the set to depend not only on the distance from the center of mass, but also on the direction. In those directions where the ellipsoid has a short axis the test point must be closer, while in those where the axis is long the test point can be further away from the center.

            Putting this on a mathematical basis, the ellipsoid that best represents the set's probability distribution can be estimated by building the covariance matrix of the samples. The Mahalanobis distance is simply the distance of the test point from the center of mass divided by the width of the ellipsoid in the direction of the test point.

            Relationship to leverage

            Mahalanobis distance is closely related to the leverage statistic, h, but has a different scale:[4]

            Mahalanobis distance = (N ? 1)(h ? 1/N).

            Applications

            Mahalanobis' discovery was prompted by the problem of identifying the similarities of skulls based on measurements in 1927.[5]

            Mahalanobis distance is widely used in cluster analysis and classification techniques. It is closely related to used for multivariate statistical testing and Fisher's Linear Discriminant Analysis that is used for supervised classification.[6]

            In order to use the Mahalanobis distance to classify a test point as belonging to one of N classes, one first estimates the covariance matrix of each class, usually based on samples known to belong to each class. Then, given a test sample, one computes the Mahalanobis distance to each class, and classifies the test point as belonging to that class for which the Mahalanobis distance is minimal.

            Mahalanobis distance and leverage are often used to detect outliers, especially in the development of linear regression models. A point that has a greater Mahalanobis distance from the rest of the sample population of points is said to have higher leverage since it has a greater influence on the slope or coefficients of the regression equation. Mahalanobis distance is also used to determine multivariate outliers. Regression techniques can be used to determine if a specific case within a sample population is an outlier via the combination of two or more variable scores. A point can be an multivariate outlier even if it is not a univariate outlier on any variable.

            Mahalanobis distance was also widely used in biology, such as predicting protein structural class[7], predicting membrane protein type [8], predicting protein subcellular localization [9], as well as predicting many other attributes of proteins through their pseudo amino acid composition [10].

             

            多維高斯分布的指數(shù)項(xiàng)!做分類聚類的時(shí)候用的比較多

            posted on 2010-10-12 09:47 Sosi 閱讀(2329) 評(píng)論(0)  編輯 收藏 引用 所屬分類: Taps in Research

            統(tǒng)計(jì)系統(tǒng)
            久久婷婷五月综合97色| 久久人妻无码中文字幕| 日日噜噜夜夜狠狠久久丁香五月| 亚洲精品美女久久777777| 国产亚洲精品美女久久久| 亚洲狠狠久久综合一区77777 | 久久久久亚洲AV片无码下载蜜桃 | 国产日产久久高清欧美一区| 久久WWW免费人成—看片| 久久久久久精品久久久久| 久久久老熟女一区二区三区| 丰满少妇人妻久久久久久4| 国内精品伊人久久久久777| 久久国产色AV免费看| 日本精品一区二区久久久| 国内精品久久久久影院一蜜桃| 久久综合一区二区无码| 国产精品天天影视久久综合网| 日产精品久久久久久久| 久久精品国产一区二区三区日韩| 欧美久久久久久| 国产日韩久久久精品影院首页| 久久香综合精品久久伊人| 久久久久久综合网天天| 久久精品国产亚洲精品| 国产精品久久久久久久久免费| 亚洲第一极品精品无码久久| 亚洲精品国产综合久久一线| 久久精品夜色噜噜亚洲A∨| 国产成人久久激情91| 国产午夜免费高清久久影院| 亚洲AV无码久久精品蜜桃| 国产精品久久久香蕉| 日本加勒比久久精品| 欧美激情精品久久久久久久| 久久99久久成人免费播放| 97精品国产97久久久久久免费| 久久狠狠色狠狠色综合| 91精品国产91久久久久久青草| 国产欧美一区二区久久| 国产高清国内精品福利99久久|