function rate = KNN(Train_data,Train_label,Test_data,Test_label,k,Distance_mark);

% K-Nearest-Neighbor classifier(K-NN classifier)
%Input:
%     Train_data,Test_data are training data set and test data
%     set,respectively.(Each row is a data point)
%     Train_label,Test_label are column vectors.They are labels of training
%     data set and test data set,respectively.
%     k is the number of nearest neighbors
%     Distance_mark           :   ['Euclidean', 'L2'| 'L1' | 'Cos'] 
%     'Cos' represents Cosine distance.
%Output:
%     rate:Accuracy of K-NN classifier
%
%    Examples:
%      
% %Classification problem with three classes
% A = rand(50,300);
% B = rand(50,300)+2;
% C = rand(50,300)+3;
% % label vector for the three classes
% gnd = [ones(300,1);2*ones(300,1);3*ones(300,1)];
% fea = [A B C]';
% trainIdx = [1:150,301:450,601:750]';
% testIdx = [151:300,451:600,751:900]';
% fea_Train = fea(trainIdx,:);
% gnd_Train = gnd(trainIdx);
% fea_Test = fea(testIdx,:);
% gnd_Test = gnd(testIdx);
% rate = KNN(fea_Train,gnd_Train,fea_Test,gnd_Test,1)
%
%
%
%Reference:
%
% If you used my matlab code, we appreciate it very much if you can cite our following papers:
% Jie Gui, Tongliang Liu, Dacheng Tao, Zhenan Sun, Tieniu Tan, "Representative Vector Machines: A unified framework for classical classifiers", IEEE  
% Transactions on Cybernetics.
    
%This code is written by Gui Jie in the evening 2009/03/11.
%If you have find some bugs in the codes, feel free to contract me
if nargin < 5
    error('Not enought arguments!');
elseif nargin < 6
    Distance_mark='L2';
end
 
[n dim]    = size(Test_data);% number of test data set
train_num  = size(Train_data, 1); % number of training data set
% Normalize each feature to have zero mean and unit variance.
% If you need the following four rows,you can uncomment them.
% M        = mean(Train_data); % mean & std of the training data set
% S        = std(Train_data);
% Train_data = (Train_data - ones(train_num, 1) * M)./(ones(train_num, 1) * S); % normalize training data set
% Test_data            = (Test_data-ones(n,1)*M)./(ones(n,1)*S); % normalize data
U        = unique(Train_label); % class labels
nclasses = length(U);%number of classes
Result  = zeros(n, 1);
Count   = zeros(nclasses, 1);
dist=zeros(train_num,1);
for i = 1:n
    % compute distances between test data and all training data and
    % sort them
    test=Test_data(i,:);
    for j=1:train_num
        train=Train_data(j,:);V=test-train;
        switch Distance_mark
            case {'Euclidean', 'L2'}
                dist(j,1)=norm(V,2); % Euclead (L2) distance
            case 'L1'
                dist(j,1)=norm(V,1); % L1 distance
            case 'Cos'
                dist(j,1)=acos(test*train'/(norm(test,2)*norm(train,2)));     % cos distance
            otherwise
                dist(j,1)=norm(V,2); % Default distance
        end
    end
    [Dummy Inds] = sort(dist);
    % compute the class labels of the k nearest samples
    Count(:) = 0;
    for j = 1:k
        ind        = find(Train_label(Inds(j)) == U); %find the label of the j'th nearest neighbors 
        Count(ind) = Count(ind) + 1;
    end% Count:the number of each class of k nearest neighbors
    
    % determine the class of the data sample
    [dummy ind] = max(Count);
    Result(i)   = U(ind);
end
correctnumbers=length(find(Result==Test_label));
rate=correctnumbers/n;






--------------------------------------------------以上是代碼---------------------------------------------------------------------

余弦距離和余弦相似度的區別

餘弦相似度(cosine similarity)乃是傳統文件分類中,最常被拿來度量文件間距離的基本度量方法,其以兩個 d 維向量間的角度差異來度量該向量間的距離,所得數據介於 0 ~ 1 之間,當兩向量角度越相近時,所求出的餘弦距離越接近1;反之,則越接近 0。假設在 d 維空間中有兩點a = [a1, a2, …, ad],b =  [b1, b2, …,bd],則其餘弦相似度可表示為: 
cosineSimilarity(a,b) = dot(a,b) / (norm(a)*norm(b))   [我覺得這里說成cosineSimilarity,不應該說成cosineDistance。相似度越大,距離應該越小。比如,a和b夾角為0,此時最相似,相似度最大,距離最小]
 dot(a,b)  代表a和b的內積,因為向量內積定義為a·b = |a| × |b| × cosθ, (一般情況下,θ∈[0,π], http://baike.baidu.com/view/1485493.htm )。故這樣定義不能滿足在 0 ~ 1 之間,而是-1到1之間,有兩種方式:
(1) 我下面的代碼是正確的,用acos,將這個余弦轉化為[0, π]之間的角度. 未必一定要限制在0 ~ 1 ,我的代碼轉化成[0, π],值越大代表其距離越大;
(2) cosineDistance(a,b) = 1- cosineSimilarity(a,b) = 1- dot(a,b) / (norm(a)*norm(b))。cosineDistance的范圍就在[0 2]。
範例: 
a=[1 1 1]; b=[1 0 0]; 
cosineDistance = dot(a,b) / (norm(a)*norm(b))   
cosineDistance = 
    0.5774   [http://neural.cs.nthu.edu.tw/jang/books/dcpr/doc/02%E8%B7%9D%E9%9B%A2%E8%88%87%E7%9B%B8%E4%BC%BC%E5%BA%A6.pdf ,已經保存到電腦:距離與相似度.pdf]
"KDD17_Linearized GMM Kernels"的公式1是cosine的等價表示形式;“Min-Max Kernels 1503.01737”的公式1是Min-Max Kernel的標準定義

Minimal Local Reconstruction Error Measure Based Discriminant Feature Extraction and Classification的P3
For robustness, before classification, we generally need to normalize feature vectors, making the length of each feature vector to be 1, that is, x -> x /|| x || .
The normalized Euclidean distance is equivalent to the cosine distance.
(1) Lin Zhu 師弟講將循環改為計算距離矩陣會節省時間,因為matlab循環很耗時,但大樣本還必須用循環否則out of memory.想起以前上課jinsong老師也提供了一個KNN代碼,不過他的也是用循環實現的.matlab有自帶的函數knnclassify,論文Sparsity preserving projections的代碼SPP_1NN.m中就用的該函數。在ASLAN上我的KNN和knnclassify識別率完全一樣
(2) 極其重要注意點:倒數第四行程序不要用Result(i)   = ind;這對Yale等標號依次為1,2,3等沒問題。對二分類1和-1就有問題。SRC_QC和SRC_QC2也是類似的,倒數第三行不能用Result(i) = index, 要用Result(i) = classLabel(index);   原來只修改了這一處,其實SRC_QC2的50行和SRC_QC的42行也要將ii修改為classLabel(ii)。正因為這個錯誤,才得出SRC在ASLAN上是50%錯誤率方差是0的錯誤結果。正確的SRC_QC2和SRC_QC程序在ASLAN目錄