by Meng Yan on Nov.15, 2006, under Other
寰蔣钁楀悕鐨凜++澶у笀Herb Sutter鍦?005騫村垵鐨勬椂鍊欐浘緇忓啓榪囦竴綃囬噸閲忕駭鐨勬枃绔狅細”The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software“錛岄璦OO涔嬪悗杞歡寮鍙戝皢瑕侀潰涓寸殑鍙堜竴嬈¢噸澶у彉闈?騫惰璁$畻銆?/p>
鎽╁皵瀹氬緥緇熷埗涓嬬殑杞歡寮鍙戞椂浠f湁涓涓潪甯告湁鎰忔濈殑鐜拌薄錛?#8221;Andy giveth, and Bill taketh away.”銆備笉綆PU鐨勪富棰戞湁澶氬揩錛屾垜浠緇堟湁鍔炴硶鏉ュ埄鐢ㄥ畠錛岃屾垜浠篃闄墮唹鍦ㄦ満鍣ㄥ崌綰у甫鏉ョ殑紼嬪簭鎬ц兘鎻愰珮涓?/p>
鎴戣鐫鎴戝ぇ浜岀殑鏃跺欐浘緇忓仛榪囦竴涓簲瀛愭鐨勭▼搴忥紝褰撴椂鐨勭畻娉曞氨鏄鍏堣璁′竴浜涙鍨嬶紙鏈変紭鍏堢駭錛夛紝鐒跺悗鎵弿媯嬬洏錛屽褰㈠娍榪涜鍒嗘瀽錛岀湅鐪嬪綋鍓嶈蛋鍝儴瀵硅嚜宸辨渶閲嶈銆傚綋鐒朵笅媯嬭繕瑕佸牭鍒漢錛岃繖灝遍渶瑕佷簰鎹㈠弻鏂圭殑媯嬪瀷鍐嶈綆椼傚鏋滃彧綆椾竴姝ワ紝寰堝彲鑳借鐙$尵鐨勫鎵嬫楠楋紝鎵浠ヤ負浜嗗鎯沖嚑姝ワ紝榪橀渶瑕侀掑綊鍜屽洖鏈斻傚湪褰撴椂鐨勬満鍣ㄤ笂錛岀畻3姝ュ氨鍩烘湰涓婇渶瑕?縐掑乏鍙崇殑鏃墮棿浜嗐傚悗鏉ュぇ瀛︽瘯涓氭敹鎷句笢瑗跨殑鏃跺欐壘鍒拌繖涓▼搴忥紝璇曚簡涓涓嬶紝鍙戠幇綆?0姝ラ渶瑕佺殑鏃墮棿涔熷熀鏈笂鎰熻涓嶅嚭鏉ヤ簡銆?/p>
涓嶇煡閬撲綘鏄惁鏈夊悓鏍風殑緇忓巻錛屾垜浠笉鐭ヤ笉瑙夌殑涓鐩村湪浜彈鐫榪欐牱鐨勫厤璐瑰崍槨愩傚彲鏄紝闅忕潃鎽╁皵瀹氬緥鐨勬彁鍓嶇粓緇擄紝鍏嶈垂鐨勫崍槨愮粓絀惰榪樺洖鍘匯傝櫧鐒剁‖浠惰璁″笀榪樺湪鍔姏錛欻yper Threading CPU錛堝鍑轟竴濂楀瘎瀛樺櫒錛岀浉褰撲簬涓涓昏緫CPU錛変嬌寰桺ipeline灝藉彲鑳芥弧璐熻嵎錛屼嬌澶氫釜Thread鐨勬搷浣滄湁鍙兘騫惰錛屼嬌寰楀綰跨▼紼嬪簭鐨勬ц兘鏈?%-15%鐨勬彁鍗囷紱澧炲姞Cache瀹歸噺涔熶嬌寰楀寘鎷琒ingle-Thread鍜孧ulti-Thread紼嬪簭閮借兘鍙楃泭銆備篃璁歌繖浜涜繕鑳藉府鍔╀綘涓孌墊椂闂達紝浣嗛棶棰樻槸錛屾垜浠繀欏誨仛鍑烘敼鍙橈紝闈㈠榪欎釜鍗沖皢鍒版潵鐨勫彉闈╋紝浣犲噯澶囧ソ浜嗕箞錛?/p>
Concurrency Programming != Multi-Thread Programming銆傚緢澶氫漢閮戒細璇碝ultiThreading璋佷笉浼氾紝闂鏄紝浣犳槸涓轟粈涔堜嬌鐢?濡備綍浣跨敤澶氱嚎紼嬬殑錛熸垜浠庡墠鍋氳繃涓涓被浼糀cdSee涓鏍風殑鍥懼儚鏌ョ湅/澶勭悊紼嬪簭錛屾垜閫氬父鐢ㄥ畠鏉ュ鐞嗘垜鐨勬暟鐮佺収鐗囥傛垜鍦ㄩ噷闈㈢敤浜嗗ぇ閲忕殑澶氱嚎紼嬶紝涓嶈繃涓昏鐩殑鏄湪鍥懼儚澶勭悊鐨勬椂鍊欎笉瑕丅lock浣廢I錛屾墍浠ュ皢CPU Intensive鐨勮綆楅儴鍒嗙敤鍚庡彴綰跨▼榪涜澶勭悊銆傝屽茍娌℃湁鎶婂鍥懼儚鐭╅樀鐨勮繍綆楀茍琛屽垎寮銆?/p>
鎴戣寰桟oncurrency Programming鐪熸鐨勬寫鎴樺湪浜嶱rogramming Model鐨勬敼鍙橈紝鍦ㄧ▼搴忓憳鐨勮剳瀛愰噷闈㈣瀵硅嚜宸辯殑紼嬪簭鎬庢牱騫惰鍖栨湁寰堟竻妤氱殑璁よ瘑錛屾洿閲嶈鐨勬槸錛屽浣曞幓瀹炵幇錛堝寘鎷灦鏋勩佸閿欍佸疄鏃剁洃鎺х瓑絳夛級榪欑騫惰鍖栵紝濡備綍鍘?strong>璋冭瘯錛屽浣曞幓嫻嬭瘯銆?/p>
鍦℅oogle錛屾瘡澶╂湁嫻烽噺鐨勬暟鎹渶瑕佸湪鏈夐檺鐨勬椂闂村唴榪涜澶勭悊錛堝叾瀹炴瘡涓簰鑱旂綉鍏徃閮戒細紕板埌榪欐牱鐨勯棶棰橈級錛屾瘡涓▼搴忓憳閮介渶瑕佽繘琛屽垎甯冨紡鐨勭▼搴忓紑鍙戯紝榪欏叾涓寘鎷浣曞垎甯冦佽皟搴︺佺洃鎺т互鍙婂閿欑瓑絳夈侴oogle鐨?a >MapReduce姝f槸鎶婂垎甯冨紡鐨勪笟鍔¢昏緫浠庤繖浜涘鏉傜殑緇嗚妭涓娊璞″嚭鏉ワ紝浣垮緱娌℃湁鎴栬呭緢灝戝茍琛屽紑鍙戠粡楠岀殑紼嬪簭鍛樹篃鑳借繘琛屽茍琛屽簲鐢ㄧ▼搴忕殑寮鍙戙?/p>
MapReduce涓渶閲嶈鐨勪袱涓瘝灝辨槸Map錛堟槧灝勶級鍜孯educe錛堣綰︼級銆傚垵鐪婱ap/Reduce榪欎袱涓瘝錛岀啛鎮塅unction Language鐨勪漢涓瀹氭劅瑙夊緢鐔熸倝銆侳P鎶婅繖鏍風殑鍑芥暟縐頒負”higher order function”錛?#8221;High order function”琚垚涓篎unction Programming鐨勫埄鍣ㄤ箣涓鍝︼級錛屼篃灝辨槸璇達紝榪欎簺鍑芥暟鏄紪鍐欐潵琚笌鍏跺畠鍑芥暟鐩哥粨鍚堬紙鎴栬呰琚叾瀹冨嚱鏁拌皟鐢ㄧ殑錛夈傚鏋滆紜姣旂殑鍖栵紝鍙互鎶婂畠鎯寵薄鎴怌閲岄潰鐨凜allBack鍑芥暟錛屾垨鑰匰TL閲岄潰鐨凢unctor銆傛瘮濡備綘瑕佸涓涓猄TL鐨勫鍣ㄨ繘琛屾煡鎵撅紝闇瑕佸埗瀹氭瘡涓や釜鍏冪礌鐩告瘮杈冪殑Functor錛圕omparator錛夛紝榪欎釜Comparator鍦ㄩ亶鍘嗗鍣ㄧ殑鏃跺欏氨浼氳璋冪敤銆?/p>
鎷垮墠闈㈣榪囧浘鍍忓鐞嗙▼搴忔潵涓句緥錛屽叾瀹炲ぇ澶氭暟鐨勫浘鍍忓鐞嗘搷浣滈兘鏄鍥懼儚鐭╅樀榪涜鏌愮榪愮畻銆傝繖閲岀殑榪愮畻閫氬父鏈変袱縐嶏紝涓縐嶆槸鏄犲皠錛屼竴縐嶆槸瑙勭害銆傛嬁涓ょ鏁堟灉鏉ヨ錛?#8221;鑰佺収鐗?#8221;鏁堟灉閫氬父鏄己鍖栫収鐗囩殑G/B鍊鹼紝鐒跺悗瀵規瘡涓薄绱犲姞涓浜涢殢鏈虹殑鍋忕Щ錛岃繖浜涙搷浣滃湪浜岀淮鐭╅樀涓婄殑姣忎竴涓厓绱犻兘鏄嫭绔嬬殑錛屾槸Map鎿嶄綔銆傝?#8221;闆曞埢”鏁堟灉闇瑕佹彁鍙栧浘鍍忚竟緙橈紝灝遍渶瑕佸厓绱犱箣闂寸殑榪愮畻浜嗭紝鏄竴縐峈educe鎿嶄綔銆傚啀涓句釜綆鍗曠殑渚嬪瓙錛屼竴涓竴緇寸煩闃碉紙鏁扮粍錛塠0,1,2,3,4]鍙互鏄犲皠涓篬0,2,3,6,8]錛堜箻2錛夛紝涔熷彲浠ユ槧灝勪負[1,2,3,4,5]錛堝姞1錛夈傚畠鍙互瑙勭害涓?錛堝厓绱犳眰縐級涔熷彲浠ヨ綰︿負10錛堝厓绱犳眰鍜岋級銆?/p>
闈㈠澶嶆潅闂錛屽彜浜烘暀瀵兼垜浠“鍒?/strong>鑰?strong>娌?/strong>涔?#8221;錛岃嫳鏂囦腑瀵瑰簲鐨勮瘝鏄?#8221;Divide and Conquer“銆侻ap/Reduce鍏跺疄灝辨槸Divide/Conquer鐨勮繃紼嬶紝閫氳繃鎶婇棶棰楧ivide錛屼嬌榪欎簺Divide鍚庣殑Map榪愮畻楂樺害騫惰錛屽啀灝哅ap鍚庣殑緇撴灉Reduce錛堟牴鎹煇涓涓狵ey錛夛紝寰楀埌鏈緇堢殑緇撴灉銆?/p>
Googler鍙戠幇榪欐槸闂鐨勬牳蹇冿紝鍏跺畠閮芥槸鍏辨ч棶棰樸傚洜姝わ紝浠栦滑鎶奙apReduce鎶借薄鍒嗙鍑烘潵銆傝繖鏍鳳紝Google鐨勭▼搴忓憳鍙互鍙叧蹇冨簲鐢ㄩ昏緫錛屽叧蹇冩牴鎹摢浜汯ey鎶婇棶棰樿繘琛屽垎瑙o紝鍝簺鎿嶄綔鏄疢ap鎿嶄綔錛屽摢浜涙搷浣滄槸Reduce鎿嶄綔銆傚叾瀹冨茍琛岃綆椾腑鐨勫鏉傞棶棰樿濡傚垎甯冦佸伐浣滆皟搴︺佸閿欍佹満鍣ㄩ棿閫氫俊閮戒氦緇橫ap/Reduce Framework鍘誨仛錛屽緢澶х▼搴︿笂綆鍖栦簡鏁翠釜緙栫▼妯″瀷銆?/p>
MapReduce鐨勫彟涓涓壒鐐規槸錛孧ap鍜孯educe鐨?strong>杈撳叆鍜岃緭鍑洪兘鏄腑闂翠復鏃舵枃浠?/strong>錛圡apReduce鍒╃敤Google鏂囦歡緋葷粺鏉ョ鐞嗗拰璁塊棶榪欎簺鏂囦歡錛夛紝鑰屼笉鏄笉鍚岃繘紼嬮棿鎴栬呬笉鍚屾満鍣ㄩ棿鐨勫叾瀹冮氫俊鏂瑰紡銆傛垜瑙夊緱錛岃繖鏄疓oogle涓璐殑椋庢牸錛屽寲綣佷負綆錛岃繑鐠炲綊鐪熴?/p>
鎺ヤ笅鏉ュ氨鏀句笅鍏跺畠錛岀爺絀朵竴涓婱ap/Reduce鎿嶄綔銆傦紙鍏跺畠姣斿瀹歸敊銆佸浠戒換鍔′篃鏈夊緢緇忓吀鐨勭粡楠屽拰瀹炵幇錛岃鏂囬噷闈㈤兘鏈夎榪幫級 Map鐨勫畾涔夛細 Map, written by the user, takes an input pair and produces a set of intermediate key/value pairs. The MapReduce library groups together all intermediate values associated with the same intermediate key I and passes them to the Reduce function. Reduce鐨勫畾涔夛細 The Reduce function, also written by the user, accepts an intermediate key I and a set of values for that key. It merges together these values to form a possibly smaller set of values. Typically just zero or one output value is produced per Reduce invocation. The intermediate values are supplied to the user’s reduce function via an iterator. This allows us to handle lists of values that are too large to fit in memory. MapReduce璁烘枃涓粰鍑轟簡榪欐牱涓涓緥瀛愶細鍦ㄤ竴涓枃妗i泦鍚堜腑緇熻姣忎釜鍗曡瘝鍑虹幇鐨勬鏁般?/p>
Map鎿嶄綔鐨勮緭鍏ユ槸姣忎竴綃囨枃妗o紝灝嗚緭鍏ユ枃妗d腑姣忎竴涓崟璇嶇殑鍑虹幇杈撳嚭鍒頒腑闂存枃浠朵腑鍘匯?/p>
map(String key, String value): 姣斿鎴戜滑鏈変袱綃囨枃妗o紝鍐呭鍒嗗埆鏄?/p>
A 錛?“I love programming” B 錛?“I am a blogger, you are also a blogger”銆?/p>
B鏂囨。緇忚繃Map榪愮畻鍚庤緭鍑虹殑涓棿鏂囦歡灝嗕細鏄細 Reduce鎿嶄綔鐨勮緭鍏ユ槸鍗曡瘝鍜屽嚭鐜版鏁扮殑搴忓垪銆傜敤涓婇潰鐨勪緥瀛愭潵璇達紝灝辨槸 (”I”, [1, 1]), (”love”, [1]), (”programming”, [1]), (”am”, [1]), (”a”, [1,1]) 絳夈傜劧鍚庢牴鎹瘡涓崟璇嶏紝綆楀嚭鎬葷殑鍑虹幇嬈℃暟銆?/p>
reduce(String key, Iterator values): 鏈鍚庤緭鍑虹殑鏈緇堢粨鏋滃氨浼氭槸錛?”I”, 2″), (”a”, 2″)…… 瀹為檯鐨勬墽琛岄『搴忔槸錛?/p>
// key: document name
// value: document contents
for each word w in value:
EmitIntermediate(w, “1″); I,1
am,1
a,1
blogger,1
you,1
are,1
a,1
blogger,1
// key: a word
// values: a list of counts
int result = 0;
for each v in values:
result += ParseInt(v);
Emit(AsString(result));
紓佺洏涓婏紝騫朵笖鎶婃枃浠朵俊鎭紶鍥炵粰Master錛圡aster闇瑕佹妸榪欎簺淇℃伅鍙戦佺粰Reduce worker錛夈傝繖閲屾渶閲嶈鐨勪竴鐐規槸錛?strong>鍦ㄥ啓紓佺洏鐨勬椂鍊欙紝闇瑕佸皢涓棿鏂囦歡鍋歅artition錛堟瘮濡俁涓級銆傛嬁涓婇潰鐨勪緥瀛愭潵涓句緥錛屽鏋滄妸鎵鏈夌殑淇℃伅瀛樺埌涓涓枃浠訛紝Reduce worker鍙堜細鍙樻垚鐡墮銆傛垜浠彧闇瑕佷繚璇?strong>鐩稿悓Key鑳藉嚭鐜板湪鍚屼竴涓狿artition閲岄潰灝卞彲浠ユ妸榪欎釜闂鍒嗚В銆?
鍙錛岃繖閲岀殑鍒嗭紙Divide錛変綋鐜板湪涓ゆ錛屽垎鍒槸灝嗚緭鍏ュ垎鎴怣浠斤紝浠ュ強灝哅ap鐨勪腑闂寸粨鏋滃垎鎴怰浠姐傚皢杈撳叆鍒嗗紑閫氬父寰堢畝鍗曪紝Map鐨勪腑闂寸粨鏋滈氬父鐢?#8221;hash(key) mod R”榪欎釜緇撴灉浣滀負鏍囧噯錛屼繚璇佺浉鍚岀殑Key鍑虹幇鍦ㄥ悓涓涓狿artition閲岄潰銆傚綋鐒訛紝浣跨敤鑰呬篃鍙互鎸囧畾鑷繁鐨凱artition Function錛屾瘮濡傦紝瀵逛簬Url Key錛屽鏋滃笇鏈涘悓涓涓狧ost鐨刄RL鍑虹幇鍦ㄥ悓涓涓狿artition錛屽彲浠ョ敤”hash(Hostname(urlkey)) mod R”浣滀負Partition Function銆?/p>
瀵逛簬涓婇潰鐨勪緥瀛愭潵璇達紝姣忎釜鏂囨。涓兘鍙兘浼氬嚭鐜版垚鍗冧笂涓囩殑 (”the”, 1)榪欐牱鐨勪腑闂寸粨鏋滐紝鐞愮鐨勪腑闂存枃浠跺繀鐒跺鑷翠紶杈撲笂鐨勬崯澶便傚洜姝わ紝MapReduce榪樻敮鎸佺敤鎴鋒彁渚汣ombiner Function銆傝繖涓嚱鏁伴氬父涓嶳educe Function鏈夌浉鍚岀殑瀹炵幇錛屼笉鍚岀偣鍦ㄤ簬Reduce鍑芥暟鐨勮緭鍑烘槸鏈緇堢粨鏋滐紝鑰孋ombiner鍑芥暟鐨勮緭鍑烘槸Reduce鍑芥暟鐨勬煇涓涓緭鍏ョ殑涓棿鏂囦歡銆?/p>
Tom White緇欏嚭浜哊utch[2]涓彟涓涓緢鐩磋鐨勪緥瀛愶紝鍒嗗竷寮廏rep銆傛垜涓鐩磋寰楋紝Pipe涓殑寰堝鎿嶄綔錛屾瘮濡侻ore銆丟rep銆丆at閮界被浼間簬涓縐峂ap鎿嶄綔錛岃孲ort銆乁niq銆亀c絳夐兘鐩稿綋浜庢煇縐峈educe鎿嶄綔銆?/p>
鍔犱笂鍓嶄袱澶〨oogle鍒氬垰鍙戝竷鐨?a >BigTable璁烘枃錛岀幇鍦℅oogle鏈変簡鑷繁鐨勯泦緹?- Googel Cluster錛屽垎甯冨紡鏂囦歡緋葷粺 - GFS錛屽垎甯冨紡璁$畻鐜 - MapReduce錛屽垎甯冨紡緇撴瀯鍖栧瓨鍌?- BigTable錛屽啀鍔犱笂Lock Service銆傛垜鐪熺殑鑳芥劅瑙夌殑鍒癎oogle钁楀悕鐨勫厤璐規櫄槨愪箣澶栫殑瀵逛簬紼嬪簭鍛樼殑鍙︿竴縐嶅厤璐圭殑鏅氶錛岄偅涓敱澶ч噺鐨刢ommodity PC緇勬垚鐨刲arge clusters銆傛垜瑙夊緱榪欎簺鎵嶇湡姝f槸Google鐨勬牳蹇冧環鍊兼墍鍦ㄣ?/p>
鍛靛懙錛屽氨鍍忓井杞佸叺Joel Spolsky錛堜綘搴旇鐪嬭繃浠栫殑”Joel on Software”鍚э紵錛夋浘緇忚榪囷紝瀵逛簬寰蔣鏉ヨ鏈鍙曠殑鏄痆1]錛屽井杞繕鍦ㄨ嫤鑻﹁拷璧禛oogle鏉ュ畬鍠凷earch鍔熻兘鐨勬椂鍊欙紝Google宸茬粡鍦ㄩ儴緗蹭笅涓浠g殑瓚呯駭璁$畻鏈轟簡銆?/p>
The very fact that Google invented MapReduce, and Microsoft didn’t, says something about why Microsoft is still playing catch up trying to get basic search features to work, while Google has moved on to the next problem: building Skynet^H^H^H^H^H^H the world’s largest massively parallel supercomputer. I don’t think Microsoft completely understands just how far behind they are on that wave.
娉?錛氬叾瀹烇紝寰蔣涔熸湁鑷繁鐨勬柟妗?- DryAd銆傞棶棰樻槸錛屽ぇ鍏徃閲岋紝瑕佹兂閲嶆柊閮ㄧ講榪欐牱涓涓簳灞傜殑InfraStructure錛屾棤璁烘槸鎶鏈殑鍘熷洜錛岃繕鏄斂娌葷殑鍘熷洜錛屽皢鏄浣曠殑闅俱?/p>
娉?錛?a >Lucene涔嬬埗Doug Cutting鐨勫張涓鍔涗綔錛孭roject Hadoop - 鐢盚adoop鍒嗗竷寮忔枃浠剁郴緇熷拰涓涓狹ap/Reduce鐨勫疄鐜扮粍鎴愶紝Lucene/Nutch鐨勬垚浜х嚎涔熷榻愬叏鐨勪簡銆?/p>