波多野结衣中文字幕久久,亚洲国产精品久久久久婷婷软件,青青草国产成人久久91网

關(guān)于map/reduce的combiner運(yùn)行時(shí)機(jī)的問(wèn)題

Posted on 2012-11-06 23:52 whspecial 閱讀(943) 評(píng)論(0) 編輯收藏引用所屬分類(lèi): hadoop

map/reduce的combiner到底在什么時(shí)候運(yùn)行？

在網(wǎng)上大多數(shù)資料中，都是說(shuō)combiner在map端運(yùn)行，發(fā)生在map輸出數(shù)據(jù)之后，經(jīng)過(guò)combiner再傳遞給reducer。但是之前在工作中出現(xiàn)的一個(gè)問(wèn)題導(dǎo)致我發(fā)現(xiàn)原來(lái)combiner居然也會(huì)在reducer端運(yùn)行，并且會(huì)多次運(yùn)行。
在網(wǎng)上查了之后發(fā)現(xiàn)，這是hadoop-0.18版本引入的新feature：
Changed policy for running combiner. The combiner may be run multiple times as the map's output is sorted and merged. Additionally, it may be run on the reduce side as data is merged. The old semantics are available in Hadoop 0.18 if the user calls: job.setCombineOnlyOnce(true)。
實(shí)際上combiner會(huì)在mapper端和reducer端分別運(yùn)運(yùn)行，看了下代碼，發(fā)生combine的時(shí)機(jī)在以下：
1）在mapper端的spill階段，在緩存中的記錄超過(guò)閾值時(shí)會(huì)進(jìn)行combine

if (spstart != spindex) {

…

combineAndSpill(kvIter, combineInputCounter);

}

2）在mapper端的merge階段，進(jìn)行merge的spill文件數(shù)目>=3時(shí)會(huì)進(jìn)行combine

if (null == combinerClass || numSpills < minSpillsForCombine) {

Merger.writeFile(kvIter, writer, reporter);

} else {

combineCollector.setWriter(writer);

combineAndSpill(kvIter, combineInputCounter);

}

3）在reducer端，一定會(huì)進(jìn)行combine

只有注冊(cè)用戶(hù)登錄后才能發(fā)表評(píng)論。
【推薦】100%開(kāi)源！大型工業(yè)跨平臺(tái)軟件C++源碼提供，建模，組態(tài)！

相關(guān)文章: 跨機(jī)房的hadoop集群 Dremel存儲(chǔ)格式解析 Orcfile文件格式解析（2） Orcfile文件格式解析（1）關(guān)于map/reduce的combiner運(yùn)行時(shí)機(jī)的問(wèn)題

網(wǎng)站導(dǎo)航: 博客園 IT新聞 BlogJava 博問(wèn) Chat2DB 管理

實(shí)驗(yàn)室宅男的一畝三分地

導(dǎo)航

常用鏈接

留言簿

隨筆分類(lèi)

隨筆檔案

搜索

最新評(píng)論

閱讀排行榜

評(píng)論排行榜

關(guān)于map/reduce的combiner運(yùn)行時(shí)機(jī)的問(wèn)題