99久久这里只有精品,国内精品久久久久久久久电影网,99久久精品免费国产大片

協同推薦算法實踐之Slope One的介紹（轉）

       Slope One 之一 : 簡單高效的協同過濾算法(轉)(
      原文地址：http://blog.sina.com.cn/s/blog_4d9a06000100am1d.html

       現在做的一個項目中需要用到推薦算法, 在網上查了一下. Beyond Search介紹了一個協同過濾算法(Collaborative Filtering) : Slope One;和其它類似算法相比, 它的最大優點在于算法很簡單, 易于實現, 執行效率高, 同時推薦的準確性相對很高;

基本概念
       Slope One的基本概念很簡單, 例子1, 用戶X, Y和A都對Item1打了分. 同時用戶X,Y還對Item2打了分, 用戶A對Item2可能會打多少分呢?

User Rating to Item 1 Rating to Item 2

X 5 3

Y 4 3

A 4 ?

        根據SlopeOne算法, 應該是:4 - ((5-3) + (4-3))/2 = 2.5.
        解釋一下. 用戶X對Item1的rating是5, 對Item2的rating是3, 那么他可能認為Item2應該比Item1少兩分. 同時用戶Y認為Item2應該比Item1少1分. 據此我們知道所有對Item1和Item2都打了分的用戶認為Item2會比Item1平均少1.5分. 所以我們有理由推薦用戶A可能會對Item2打(4-1.5)=2.5分;

        很簡單是不是? 找到對Item1和Item2都打過分的用戶, 算出rating差的平均值, 這樣我們就能推測出對Item1打過分的用戶A對Item2的可能Rating, 并據此向A用戶推薦新項目.
        這里我們能看出Slope One算法的一個很大的優點, 在只有很少的數據時候也能得到一個相對準確的推薦, 這一點可以解決Cold Start的問題.

       加權算法
       接下來我們看看加權算法(Weighted Slope One). 如果有100個用戶對Item1和Item2都打過分, 有1000個用戶對Item3和Item2也打過分. 顯然這兩個rating差的權重是不一樣的. 因此我們的計算方法是
      (100*(Rating 1 to 2) + 1000(Rating 3 to 2)) / (100 + 1000)。更詳細的加權算法實例：請看這里

       上面討論的是用戶只對項目的喜好程度打分.還有一種情況下用戶也可以對項目的厭惡程度打分. 這時可以使用雙極SlopeOne算法(BI-Polar SlopeOne). 我還在研究這篇論文,搞懂了再寫吧, 呵呵;

Slope One 算法是由 Daniel Lemire 教授在 2005 年提出. 這里可以找到論文原文(PDF);上面也列出了幾個參考實現. 現在有Python, Java和Erlang, 還沒有C#.這篇: tutorial about how to implement Slope One in Python是一個很好的怎么實現SlopeOne并使用它來推薦的例子。

Slope One 算法 (三) ：加權平均實例
原文地址：http://blog.sina.com.cn/s/blog_4d9a06000100am69.html

例子：
首先計算item1和item2的平均差值，((5-3)+(3-4))/2=0.5，還有item1和item3的平均差值，就是5-2=3，然后推算lucy對item1的評分，根據item1和item2的平均差值來看lucy對item1的評分可能為2+0.5=2.5，同理根據item1和item3的平均差值lucy對item1的評分可能為5+3=8.
現在如何取舍那？使用加權平均數應該是一種比較好的方法:（因為2.5是根據兩個值推算的，8是通過一個只推算的）
slope one 算法差不多真的就是這么簡單了！
有一個開源的Java程序taste里面有一個完整的slope one算法的實現，包括程序和一個關于grouplens數據的實例程序（或者說是驗證程序……）。
個人覺得slope one 很好、很強大呀！足夠簡單，推薦準確度也不遜色與其他復雜的推薦算法（當然，這個東西更大程度上取決與數據樣本）。而且taste程序寫的也很不錯，稍加改造應該就可以用了。

Slope One 之二: C#實現
原文地址：http://blog.sina.com.cn/s/blog_4d9a06000100am69.html

上一篇簡單介紹了Slope One算法的概念, 這次介紹C#實現
使用基于Slope One算法的推薦需要以下數據:
1. 有一組用戶
2. 有一組Items(文章, 商品等)
3. 用戶會對其中某些項目打分(Rating)表達他們的喜好
Slope One算法要解決的問題是, 對某個用戶, 已知道他對其中一些Item的Rating了, 向他推薦一些他還沒有Rating的Items, 以增加銷售機會. :-)

一個推薦系統的實現包括以下三步:
1. 計算出任意兩個Item之間Rating的差值
2. 輸入某個用戶的Rating記錄, 推算出對其它Items的可能Rating值
3. 根據Rating的值排序, 給出Top Items;

第一步:例如我們有三個用戶和4個Items, 用戶打分的情況如下表.

Ratings User1 User2 User3

Item1 5 4 4

Item2 4 5 4

Item3 4 3 N/A

Item4 N/A 5 5

在第一步中我們的工作就是計算出Item之間兩兩的打分之差, 也就是使說計算出以下矩陣:

　 Item1 Item2 Item3 Item4

Item1 N/A 0/3 2/2 -2/2

Item2 0/3 N/A 2/2 -1/2

Item3 -2/2 -2/2 N/A -2/1

Item4 2/2 1/2 2/1 N/A

考慮到加權算法, 還要記錄有多少人對這兩項打了分(Freq), 我們先定義一個結構來保存Rating:
    public class Rating
    {
        public float Value { get; set; }
        public int Freq { get; set; }

        public float AverageValue
        {
            get {return Value / Freq;}
        }
    }
我決定用一個Dictionary來保存這個結果矩陣:
    public class RatingDifferenceCollection : Dictionary<string, Rating>
    {
        private string GetKey(int Item1Id, int Item2Id)
        {
            return Item1Id + "/" + Item2Id;
        }

        public bool Contains(int Item1Id, int Item2Id)
        {
            return this.Keys.Contains<string>(GetKey(Item1Id, Item2Id));
        }

        public Rating this[int Item1Id, int Item2Id]
        {
            get {
                    return this[this.GetKey(Item1Id, Item2Id)];
            }
            set { this[this.GetKey(Item1Id, Item2Id)] = value; }
        }
    }

接下來我們來實現SlopeOne類. 首先創建一個RatingDifferenceCollection來保存矩陣, 還要創建HashSet來保持系統中總共有哪些Items:
    public class SlopeOne
    {
        public RatingDifferenceCollection _DiffMarix = new RatingDifferenceCollection(); // The dictionary to keep the diff matrix
        public HashSet<int> _Items = new HashSet<int>(); // Tracking how many items totally

方法AddUserRatings接收一個用戶的打分記錄(Item-Rating): public void AddUserRatings(IDictionary<int, float> userRatings)
AddUserRatings中有兩重循環, 外層循環遍歷輸入中的所有Item, 內層循環再遍歷一次, 計算出一對Item之間Rating的差存入_DiffMarix, 記得Freq加1, 以記錄我們又碰到這一對Items一次:
    Rating ratingDiff = _DiffMarix[item1Id, item2Id];
    ratingDiff.Value += item1Rating - item2Rating;
    ratingDiff.Freq += 1;

對每個用戶調用AddUserRatings后, 建立起矩陣. 但我們的矩陣是以表的形式保存:

　 Rating Dif Freq

Item1-2 0 3

Item1-3 1 2

Item2-1 0 3

Item2-3 1 2

Item3-1 -1 2

Item3-2 -1 2

Item1-4 -1 2

Item2-4 -0.5 2

Item3-4 -2 1

Item4-1 1 2

Item4-2 0.5 2

Item4-3 2 1

第二步:輸入某個用戶的Rating記錄, 推算出對其它Items的可能Rating值:
public IDictionary<int, float> Predict(IDictionary<int, float> userRatings)
也是兩重循環, 外層循環遍歷_Items中所有的Items; 內層遍歷userRatings, 用此用戶的ratings結合第一步得到的矩陣, 推算此用戶對系統中每個項目的Rating:
    Rating itemRating = new Rating(); // Prediction of this user's rating
    ...
    Rating diff = _DiffMarix[itemId, inputItemId]:
    itemRating.Value += diff.Freq * (diff.AverageValue + userRating.Value);
    itemRating.Freq += diff.Freq;

第三步:得到用戶的Rating預測后,就可以按rating排序, 向用戶推薦了. 測試一下:
    Dictionary<int, float> userRating userRating = new Dictionary<int, float>();
    userRating.Add(1, 5);
    userRating.Add(3, 4);
    IDictionary<int, float> Predictions = test.Predict(userRating);
    foreach (var rating in Predictions)
    {
        Console.WriteLine("Item " + rating.Key + " Rating: " + rating.Value);
    }
輸出:
Item 2 Rating: 5
Item 4 Rating: 6

改進:
觀察之前產生的矩陣可以發現, 其中有很多浪費的空間; 例如: 對角線上永遠是不會有值的. 因為我們是用線性表保存矩陣值, 已經避免了這個問題;
對角線下方的值和對角線上方的值非常對稱,下方的值等于上方的值乘以-1; 在數據量很大的時候是很大的浪費. 我們可以通過修改RatingDifferenceCollection來完善. 可以修改GetKey方法, 用Item Pair來作為Key:
    private string GetKey(int Item1Id, int Item2Id) {
        return (Item1Id < Item2Id) ? Item1Id + "/" + Item2Id : Item2Id + "/" + Item1Id ;;
    }
完整代碼在這里,在.net 3.5上調試通過;

Reference:
Tutorial about how to implement Slope One in Python
Slope One Predictors for Online Rating-Based Collaborative Filtering
Recommender Systems: Slope One

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace SlopeOne
{
    public class Rating
    {
        public float Value { get; set; }
        public int Freq { get; set; }

        public float AverageValue
        {
            get { return Value / Freq; }
        }
    }

    public class RatingDifferenceCollection : Dictionary<string, Rating>
    {
        private string GetKey(int Item1Id, int Item2Id)
        {
            return (Item1Id < Item2Id) ? Item1Id + "/" + Item2Id : Item2Id + "/" + Item1Id ;
        }

        public bool Contains(int Item1Id, int Item2Id)
        {
            return this.Keys.Contains<string>(GetKey(Item1Id, Item2Id));
        }

        public Rating this[int Item1Id, int Item2Id]
        {
            get {
                    return this[this.GetKey(Item1Id, Item2Id)];
            }
            set { this[this.GetKey(Item1Id, Item2Id)] = value; }
        }
    }

     public class SlopeOne
    {
        public RatingDifferenceCollection _DiffMarix = new RatingDifferenceCollection();  // The dictionary to keep the diff matrix
        public HashSet<int> _Items = new HashSet<int>();  // Tracking how many items totally

        public void AddUserRatings(IDictionary<int, float> userRatings)
        {
            foreach (var item1 in userRatings)
            {
                int item1Id = item1.Key;
                float item1Rating = item1.Value;
                _Items.Add(item1.Key);

                foreach (var item2 in userRatings)
                {
                    if (item2.Key <= item1Id) continue; // Eliminate redundancy
                    int item2Id = item2.Key;
                    float item2Rating = item2.Value;

                    Rating ratingDiff;
                    if (_DiffMarix.Contains(item1Id, item2Id))
                    {
                        ratingDiff = _DiffMarix[item1Id, item2Id];
                    }
                    else
                    {
                        ratingDiff = new Rating();
                        _DiffMarix[item1Id, item2Id] = ratingDiff;
                    }

                    ratingDiff.Value += item1Rating - item2Rating;
                    ratingDiff.Freq += 1;
                }
            }
        }

        // Input ratings of all users
        public void AddUerRatings(IList<IDictionary<int, float>> Ratings)
        {
            foreach(var userRatings in Ratings)
            {
                AddUserRatings(userRatings);
            }
        }

        public IDictionary<int, float> Predict(IDictionary<int, float> userRatings)
        {
            Dictionary<int, float> Predictions = new Dictionary<int, float>();
            foreach (var itemId in this._Items)
            {
                if (userRatings.Keys.Contains(itemId))    continue; // User has rated this item, just skip it

                Rating itemRating = new Rating();

                foreach (var userRating in userRatings)
                {
                    if (userRating.Key == itemId) continue;
                    int inputItemId = userRating.Key;
                    if (_DiffMarix.Contains(itemId, inputItemId))
                    {
                        Rating diff = _DiffMarix[itemId, inputItemId];
                        itemRating.Value += diff.Freq * (userRating.Value + diff.AverageValue * ((itemId < inputItemId) ? 1 : -1));
                        itemRating.Freq += diff.Freq;
                    }
                }
                Predictions.Add(itemId, itemRating.AverageValue);
            }
            return Predictions;
        }

        public static void Test()
        {
            SlopeOne test = new SlopeOne();

            Dictionary<int, float> userRating = new Dictionary<int, float>();
            userRating.Add(1, 5);
            userRating.Add(2, 4);
            userRating.Add(3, 4);
            test.AddUserRatings(userRating);

            userRating = new Dictionary<int, float>();
            userRating.Add(1, 4);
            userRating.Add(2, 5);
            userRating.Add(3, 3);
            userRating.Add(4, 5);
            test.AddUserRatings(userRating);

            userRating = new Dictionary<int, float>();
            userRating.Add(1, 4);
            userRating.Add(2, 4);
            userRating.Add(4, 5);
            test.AddUserRatings(userRating);

            userRating = new Dictionary<int, float>();
            userRating.Add(1, 5);
            userRating.Add(3, 4);

            IDictionary<int, float> Predictions = test.Predict(userRating);
            foreach (var rating in Predictions)
            {
                Console.WriteLine("Item " + rating.Key + " Rating: " + rating.Value);
            }
        }
    }
}

可惜啊，代碼是vs2008寫的，我的項目是vs2005的，改編了一下這里可以下載！

posted on 2010-07-19 17:49 漂漂閱讀(10305) 評論(0) 編輯收藏引用所屬分類: 算法

只有注冊用戶登錄后才能發表評論。
【推薦】100%開源！大型工業跨平臺軟件C++源碼提供，建模，組態！

相關文章: 在windows中編譯sphinx1.10beta--coreseek(類似)(翻譯) 淺談shuffle算法--播放器的另一種隨機算法協同推薦算法實踐之Slope One的介紹（轉）傅里葉變換和拉普拉斯變換的意義 A*尋路初探(轉) 紅黑樹（Red-Black Tree）(轉) 數學之美番外篇：快排為什么那樣快(轉）什么是算法，為什么需要學算法，以及算法學到什么程度（轉載）

網站導航: 博客園 IT新聞 BlogJava 博問 Chat2DB 管理

常用鏈接

留言簿(11)

隨筆分類(159)

隨筆檔案(224)

文章分類(2)

文章檔案(4)

經典c++博客

搜索

最新評論

閱讀排行榜

評論排行榜

	Item1	Item2	Item3	Item4
Item1	N/A	0/3	2/2	-2/2
Item2	0/3	N/A	2/2	-1/2
Item3	-2/2	-2/2	N/A	-2/1
Item4	2/2	1/2	2/1	N/A

	Rating Dif	Freq
Item1-2	0	3
Item1-3	1	2
Item2-1	0	3
Item2-3	1	2
Item3-1	-1	2
Item3-2	-1	2
Item1-4	-1	2
Item2-4	-0.5	2
Item3-4	-2	1
Item4-1	1	2
Item4-2	0.5	2
Item4-3	2	1