锘??xml version="1.0" encoding="utf-8" standalone="yes"?> 榪欐槸鏉ヨ嚜浜庨樋閲屾妧鏈槈騫村崕鐨勪竴涓垎浜紝鍥犱負鍦ㄧ櫨搴︿篃鑰冭檻榪囩被浼肩殑浜嬫儏錛屾墍浠ュ惉寰楁瘮杈冩湁鎰熸偀錛岃繖閲屾妸鐩稿叧鍐呭鏁寸悊涓涓嬨?/span> 棣栧厛灝婇噸鐗堟潈錛岃繕鏄妸鍘熼摼鎺ュ拰浣滆呰創涓婏細 http://adc.alibabatech.org/carnival/history/schedule/2013/detail/main/286?video=0 鏉ヨ嚜浜庨樋閲屽惔濞佸伐紼嬪笀鐨勫垎浜?/span> 棣栧厛闇瑕佽鏄庝竴鐐癸紝璺ㄦ満鎴?/span>hadoop鍙兘搴旂敤鍦烘櫙騫朵笉鏄緢澶氾紝鍥藉唴鍍?/span>BAT榪欑宸ㄥご涔熻闇瑕侊紝浣嗘槸澶ч儴鍒嗙殑涓皬鍏徃涔熻騫朵笉闇瑕佽繖涓紝涔熻榪欐槸涓睜榫欎箣鎶錛屽懙鍛點?/span> 鎶婅繖涓棶棰樺垎涓夋鏉ヨ錛岀涓孌墊槸闂鍑虹幇鐨勮儗鏅紝絎簩孌墊槸瑙e喅璇ラ棶棰樼殑闅劇偣錛岀涓夋鏄渶緇堢殑瑙e喅鏂規銆?/span> 錛堜竴錛?nbsp;鑳屾櫙錛?/span> 鍏堣鐪嬩笅涓轟粈涔堥渶瑕佸仛涓涓法鏈烘埧鐨勫ぇ闆嗙兢錛?/span> 澶ч泦緹ょ殑浼樼偣鍦ㄤ簬鏁版嵁綆$悊鍜屾巿鏉冨鏄擄紙榪欎釜闂鍦ㄤ竴涓閮ㄩ棬鐨勫ぇ鍏徃榪樻槸寰堥噸瑕佺殑錛夛紱璺ㄩ儴闂ㄧ殑浣跨敤鏁版嵁瀹規槗錛屾棤闇閲嶅鎷夊彇鏁版嵁銆?/span> 鍦ㄩ泦緹よ揪鍒頒竴瀹氳妯℃椂錛屽崟鏈烘埧錛堟満鎴垮唴鐨勫閲忔槸鏈夐檺鐨勶級宸茬粡鏃犳硶婊¤凍闆嗙兢鐨勯渶姹備簡錛岃鎯充竴鍔蟲案閫哥殑瑙e喅闂錛岄渶瑕佸緩璁句竴涓法鏈烘埧鐨?/span>hadoop闆嗙兢銆?/span> 錛堜簩錛夋妧鏈寫鎴橈細 2.1 NameNode鐨勬ц兘闂錛?/span> 鍦ㄧ鐞嗕竴涓法澶х殑hadoop闆嗙兢鏃訛紝鐢變簬鍘熷鐨?/span>Namenode鏄崟鑺傜偣錛屽洜姝や細鎴愪負涓涓ц兘鐡墮錛岄亣鍒扮殑鎬ц兘闂涓昏鍖呮嫭涓ゆ柟闈細瀛樺偍瀹歸噺闂錛堝瓨鍌ㄥ厓鏁版嵁錛夊拰璁$畻鍘嬪姏錛堝鐞?/span>rpc璇鋒眰錛屼慨鏀瑰唴瀛樻爲鏃跺欓渶瑕佸叏灞閿侊級闂銆?/span> 鍏朵腑瀛樺偍瀹歸噺闂鍙互渚濊禆鍐呭瓨鐨勫瀭鐩存墿灞曟潵瑙e喅錛屼絾鏄綆楀帇鍔涘嵈寰堥毦閫氳繃鎻愬崌紜歡鏉ヨВ鍐籌紙鍥犱負鐩墠鍘傚晢鐨勪富瑕佸彂灞曟柟鍚戞槸澶氭牳錛岃岄潪鎻愰珮涓婚錛?/span> 2.2鏈烘埧涔嬮棿鐨勭綉緇滈檺鍒訛細 鏈烘埧涔嬮棿鐨勭綉緇滄案榪滄槸涓‖浠舵潯浠剁殑闄愬埗錛岃法鏈烘埧鐨勭綉緇滀紶杈撳甫鏉ヤ簡鏁版嵁寤舵椂鍜屽甫瀹介檺鍒訛細 1錛?nbsp;寤舵椂涓鑸槸鍦?/span>10ms涔嬪唴錛岃?/span>hadoop涓婂ぇ閮ㄥ垎榪愯鐨勬槸紱葷嚎浣滀笟錛屽熀鏈彲鎺ュ彈 2錛?nbsp;甯﹀闄愬埗鐨勯棶棰樻瘮杈冨ぇ錛屽洜涓哄崟鏈烘埧鍐呯殑鐐瑰鐐瑰甫瀹戒竴鑸槸鍦?/span>1Gbps錛岃屾満鎴夸箣闂寸殑甯﹀紜湪20Mbps宸﹀彸錛岄潪甯告湁闄愩?/span> 2.3璧勬簮緇勪箣闂寸殑綆$悊 姣忎釜閮ㄩ棬鍙互鐪嬪仛涓涓祫婧愮粍錛屽畠浠彲鑳戒細浜掔浉浣跨敤瀵規柟鐨勬暟鎹紝鍥犳濡備綍瑙勫垝璁$畻鍜屽瓨鍌ㄧ殑浣嶇疆灝卞緢閲嶈錛屽惁鍒欎細鍦ㄥ涓満鎴夸箣闂村嚭鐜板ぇ閲忕殑鏁版嵁鎷瘋礉銆?/span> 錛堜笁錛夎В鍐蟲柟妗堬細 鍏堢湅涓嬫暣涓法闆嗙兢hadoop鐨勬灦鏋勫浘錛?/span> 閲嶇偣浠嬬粛閲岄潰涓夌偣錛屼篃灝辨槸鍜屼笂闈笁涓棶棰樼浉瀵瑰簲鐨勶細 1錛?nbsp;鍙互鐪嬪埌榪欓噷鐢誨嚭浜嗕袱涓?/span>NN錛?/span>namenode錛夛紝瀹冧滑瀹為檯涓婅繕鏄睘浜庝竴涓?/span>hadoop闆嗙兢錛岃繖鏄笟鐣岄噷鐨勪竴涓В鍐蟲柟妗堬細HDFS Fedaration錛屽畠涓轟簡瑙e喅鍏冩暟鎹妭鐐規ц兘闂錛?/span> 2錛?nbsp;鍙互鐪嬪埌榪欓噷鏈変竴涓?/span>cross node鑺傜偣錛屽畠鏄敤鏉ュ湪涓や釜鏈烘埧涔嬮棿鍚屾鏁版嵁鐨勶紝瀹冪殑璁捐鑰冭檻鍒頒簡鏈烘埧闂寸殑緗戠粶闄愬埗錛?/span> 3錛?nbsp;鏈鍚庢槸groupA銆?/span>groupB錛岃繖鏄負浜嗚В鍐蟲暟鎹駭鍑烘柟鍜屼嬌鐢ㄦ柟鍏崇郴鏉ョ敤鐨勩?/span> 3.1 Federation Federation鐩稿叧璧勬枡瑙侊細 涓轟簡姘村鉤鎵╁睍Namenode錛?/span>federation浣跨敤浜嗗涓簰鐩哥嫭绔嬬殑namenode銆傚畠浠箣闂翠簰鐩鎬笉闇瑕侀氫俊錛屾瘡涓?/span>datenode闇瑕佸悜鍏ㄩ儴namenode娉ㄥ唽騫跺彂閫佷俊鎭?/span> BlockPool鏄睘浜庝竴涓?/span>namenode鐨?/span>block闆嗗悎錛屾瘡涓?/span>blockpool涔嬮棿涔熸槸浜掔浉鐙珛鐨勩?/span> 鍦?/span>federation閲岋紝鏈変竴涓渶瑕佸叧娉ㄧ殑闂錛屽氨鏄涓?/span>namenode鐨勫湴鍧濡備綍瀵圭敤鎴瘋繘琛岄忔槑錛熷畠閲囩敤鐨勮В鍐蟲柟妗堟槸鐩綍鏍戞寕杞界殑鏂規錛堢ぞ鍖烘湁涓?/span>viewFS錛屽簲璇ュ氨鏄負浜嗚В鍐寵繖涓棶棰橈級錛氱啛鎮?/span>linux鎴栬?/span>nfs鐨勬湅鍙嬪簲璇ラ兘鐭ラ亾mount榪欎釜姒傚康錛岀洰褰曟爲鎸傝澆灝辨槸榪欎釜鎰忔濄?/span> 涓嶈繃浣跨敤鐩綍鏍戞寕杞戒篃瀛樺湪鐫涓涓棶棰橈紝灝辨槸鍚勪釜瀛愮洰褰曚笅鐨勫瓨鍌ㄨ祫婧愰渶瑕佷漢涓虹殑浠嬪叆綆$悊錛屼笉鑳藉嚭鐜頒弗閲嶇殑涓嶅潎銆?/span> 3.2 crossNode 鏈烘埧闂寸殑緗戠粶闄愬埗瑕佹眰涓嶈兘鍑虹幇澶ц妯°侀暱鏃墮棿鐨勬暟鎹嫹璐濓紝闇瑕佷竴涓笓闂ㄧ鐞嗘満鎴塊棿鏁版嵁鎷瘋礉鐨勮繘紼嬶紝鍙仛crossNode銆傚畠鏄嫭绔嬮儴緗茬殑涓涓妭鐐癸紝鍜屽厓鏁版嵁鑺傜偣鏄垎紱葷殑銆?/span> 瀹冭兘鎻愪緵鐨勫姛鑳芥鎷潵璇翠富瑕佸寘鎷互涓嬩笁鐐癸細 a錛?nbsp;鏍規嵁棰勭疆鐨勮法鏈烘埧鏂囦歡錛岃繘琛屾暟鎹嫹璐?/span> b錛?nbsp;澶勭悊瀹炴椂鐨勬暟鎹嫹璐濊姹?/span> c錛?nbsp;榪涜璺ㄦ満鎴跨殑鏁版嵁嫻侀噺鎺у埗 濡備綍寰楃煡璺ㄦ満鎴挎枃浠跺垪琛紵 鐢變簬紱葷嚎浠誨姟鍩烘湰閮芥槸瀹氭椂瑙﹀彂鐨勶紝鍙互鏍規嵁瀵瑰巻鍙蹭綔涓氱殑鍒嗘瀽鏉ュ艦鎴愪竴涓法鏈烘埧鏂囦歡鍒楄〃 3.3 璧勬簮緇勪箣闂寸殑綆$悊 鍚勪釜璧勬簮緇勪箣闂村瓨鍦ㄦ暟鎹殑渚濊禆錛屾垜浠笇鏈涢氳繃璧勬簮緇勭鐞嗭紝鑳藉疄鐜板ぇ閮ㄥ垎浠誨姟鍦ㄦ湰鏈烘埧鍐呬駭鍑烘暟鎹紝鍙湁灝戦噺璺ㄦ満鎴夸駭鍑烘暟鎹紱澶ч儴鍒嗕換鍔¤鍙栨湰鏈烘埧鐨勬暟鎹壇鏈紝鍙湁灝戦噺璺ㄦ満鎴胯鍙栨暟鎹?/span> 涓轟簡鏍囪瘑璧勬簮緇勪箣闂寸殑鏁版嵁渚濊禆鎬э紝瀹氫箟涓涓祫婧愮粍涔嬮棿鐨勮窛紱繪蹇碉細涓涓祫婧愮粍璁塊棶鍙︿竴涓祫婧愮粍鐨勬暟鎹噺瓚婂錛屽垯涓よ呯殑璺濈瓚婅繎錛屽簲璇ュ皢璺濈鎺ヨ繎鐨勮祫婧愮粍鏀懼湪鍚屼竴涓満鎴垮唴銆?/span> 涓轟簡璁╄綆楀拰浜у嚭灝藉彲鑳藉湴闈犺繎錛屼嬌鐢ㄤ竴涓?/span>MRProxy錛屽浜庝笉鍚岀被鍨嬬殑浠誨姟鍋氫笉鍚屽鐞嗭細 a錛?span style="font-size: 7pt; line-height: normal; font-family: 'Times New Roman';"> 紱葷嚎璁$畻錛氳法鏈烘埧鍒楄〃涓殑鏁版嵁姝e湪浼犺緭涓紙DC1->DC2錛夛紝DC2涓婄殑 Job 琚殏鍋滆皟搴︼紝絳夊緟浼犺緭瀹屾瘯 b錛?span style="font-size: 7pt; line-height: normal; font-family: 'Times New Roman';"> Ad-hoc鏌ヨ錛?/span>DC2涓婄殑 Job 闇瑕佽DC1涓婄殑鏁版嵁錛?/span>Job鏆傚仠璋冨害錛岄氱煡 CrossNode錛屾暟鎹紶杈撳畬姣曞悗緇х畫璋冨害 c錛?span style="font-size: 7pt; line-height: normal; font-family: 'Times New Roman';"> 鐗規畩鎯呭喌錛氳法鏈烘埧鏁版嵁 Join錛?/span>DC1澶ц〃錛?/span>DC2灝忚〃錛?/span>Job 璋冨害鍒?/span>DC1涓婏紝璺ㄦ満鎴跨洿鎺ヨ鍙?/span>DC2鏁版嵁錛屾棤闇絳夊緟 鐢變簬鏄牴鎹棰戝拰ppt鏁寸悊錛屽茍娌℃湁浠g爜鎴栬呮枃妗o紝鎵浠ュ彲鑳芥湁浜涘湴鏂圭殑鐞嗚В鏈夊亸宸紝嬈㈣繋鏉ユ彁鎰忚~
]]>
棣栧厛Dremel浣跨敤鐨勬槸鍒楀瓨妯″瀷錛屽浜庡熀鏈被鍨嬪垪瀛樿緝瀹規槗鍋氬埌錛涗絾鏄浜庡祵濂楃被鍨嬶紝Dremel涔熻兘鍋氬埌灝嗗叾鎷嗚В鎴愬熀鏈被鍨嬪茍榪涜鍒楀瓨錛岃繖鏄煎緱鎴戜滑鐮旂┒鐨勩?br />
鐩磋鐪嬩笅宓屽綾誨瀷鎸夎瀛樺偍鍜屾媶瑙e悗鎸夊垪瀛樺偍鐨勫姣旀晥鏋滐細
鐒跺悗瀵逛簬宓屽鏁版嵁綾誨瀷錛孌remel閲岄潰瀹氫箟浜嗛噷闈笁縐嶇被鍨嬬殑瀛楁
1錛屽繀欏誨嚭鐜?嬈¤屼笖浠呭嚭鐜?嬈$殑瀛楁錛歳equired
2錛屽彲鑳藉嚭鐜?嬈℃垨鑰?嬈$殑瀛楁錛歰ptional
3錛屽彲鑳藉嚭鐜?嬈℃垨鑰匩嬈″瓧孌碉細repeated
涓嬮潰浠aper鐨勪緥瀛愭潵璁茶堪鍚э細
鍏朵腑DocId鏄痳equired瀛楁錛屽洜姝ゅ湪r1,r2涓繀欏誨嚭鐜?嬈★紱url瀛楁鏄痮ptional瀛楁錛屽洜姝ゅ湪r1鐨勭涓変釜Name閲屾湭鍑虹幇錛屽湪r1鐨勫墠涓や釜Name閲屽嚭鐜頒簡1嬈★紱Backward瀛楁鏄痳epeated瀛楁錛屽洜姝ゅ湪r1鐨凩inks閲屾湭鍑虹幇錛屽湪r2鐨凩inks閲屽嚭鐜頒簡2嬈°?br />
鐞嗚В浜嗕笂闈㈣繖浜涳紝鐩存帴鏉ョ湅涓婦remel鏄庝箞鏉ュ瓨瀹冪殑鍚э細
涓婅〃涓殑姣忔潯璁板綍閮芥湁涓や釜灞炴э紝"r"浠h〃repetition level錛?d"浠h〃definition level錛屽畾涔夊涓嬶細
repetition level:what repeated field in the field’s path the value has repeated錛岃褰曡瀛楁鏄湪鍝釜repeated綰у埆涓婇噸澶嶇殑
definition level:how many fields inpthat could be undefined (because they are optional or repeated) are actually present錛岃褰曡瀛楁涔嬩笂鏈夊灝戜釜optional鎴栬卹epeated瀛楁瀹為檯鏄湁鍊肩殑錛堟湰鏉ュ彲浠ヤ負null鐨勶級
鐪嬪埌榪欓噷錛屽悇浣嶅彲鑳藉凡緇忓湪蹇冮噷榛樺康浜嗭細WTF錛佸埆鎬ワ紝鍙互緇撳悎涓涓緥瀛愭潵鐪嬶細
鍏堢湅repetition level錛堜笅闈互r鏇夸唬錛夛紝浠ame.Language.Code涓轟緥錛?/p>
1)瀵圭1涓嚭鐜扮殑鍊鹼紝鍏秗濮嬬粓涓?錛屽洜姝?en-us'鐨剅涓?
2)瀵逛簬絎?涓?en'錛屽叾涓婁竴涓兼槸'en-us'錛屽畠浠槸鍦↙anguage綰у埆鍙戠敓鐨勯噸澶嶏紝Name.Language鏄袱綰х殑repeated瀛楁錛屽洜姝涓?
3)瀵逛簬絎?涓糿ull錛屾槸涓轟簡璁板綍'en-gb'鏄嚭鐜板湪絎笁涓狽ame鑰岄潪絎簩涓狽ame閲岋紝鐗規剰鍗犱綅鐢ㄧ殑銆俷ull鐨勪笂涓涓兼槸'en'錛屽畠浠槸鍦∟ame綰у埆鍙戠敓鐨勯噸澶嶏紝鍥犳r鏄?
4)瀵逛簬絎?涓?en-gb'錛屽叾涓婁竴涓兼槸null錛屽畠浠篃鏄湪Name綰у埆鍙戠敓鐨勯噸澶嶏紝鍥犳r鏄?
5)瀵逛簬絎?涓糿ull錛屽叾涓婁竴涓兼槸'en-gb'錛屽畠浠嚭鐜板湪涓や釜涓嶅悓Document閲岋紝鍥犳r鏄?
鎬葷粨涓嬶紝鐪媟epetition level娉ㄦ剰涓ょ偣錛?,鍙瘮杈冭鍊煎拰涓婁竴涓鹼紱2,鍙渶瑕佺湅榪欎袱涓肩殑閲嶅浣嶇疆涓婃湁鍑犱釜repeated瀛楁
鍐嶇湅definition level錛堜笅闈互d鏇夸唬錛夛紝涔熶互Name.Language.Code涓轟緥錛?/p>
1)瀵逛簬'en-us'錛屽叾涓婄殑Name錛孡anguage閮藉嚭鐜頒簡錛屽洜姝涓?錛堝叾瀹炲浜庨潪null鍊肩殑瀛楁錛屽叾涓婄殑optional鎴栬卹epeated瀛楁鑲畾鏄嚭鐜頒簡錛屾墍浠ラ兘鏄浉鍚岀殑錛屽彧鏄痭ull瀛楁鐨刣鍊兼湁宸埆錛?br />
2)瀵逛簬'en'錛屽悓鐞哾涔熶負2
3)瀵逛簬null錛屽叾涓婂彧鍑虹幇浜哊ame錛屾病鏈夊嚭鐜癓anguage錛屽洜姝涓?
4)瀵逛簬'en-gb',d涔熶負2
5)瀵逛簬鏈鍚庝竴涓猲ull錛屽叾涓婁篃鍙嚭鐜頒簡Name錛屾病鏈夊嚭鐜癓anguage錛屽洜姝涓?
浠ヤ笂鍙槸璁蹭簡dremel鎬庝箞鍘誨瓨宓屽綾誨瀷錛岃嚦浜庤繖縐嶅瓨娉曟槸鎬庝箞鎯沖嚭鏉ョ殑錛岀湡闈炴垜杈堣兘鐞嗚В鐨勪簡銆傘傘傛洿澶氬唴瀹癸紝璇峰弬鑰冨師钁梡aper鍙婄綉涓婅В鏋愩?br />
瀛楀吀緙栫爜錛氱敤浜嶴tring綾誨瀷鐨勫瓧孌?br />
Run-Length緙栫爜錛氱敤浜巌nt錛宭ong錛宻hort絳夌被鍨嬬殑緙栫爜
Bit緙栫爜錛氬彲浠ョ敤浜庡悇縐嶆暟鎹被鍨?br />
1錛屽瓧鍏哥紪鐮侊細
瀵逛簬String綾誨瀷鐨勬瘡涓瓧孌靛垎鍒繚瀛樹竴涓瓧鍏革紝璁板綍姣忎釜鍊煎湪瀛楀吀涓殑浣嶇疆錛屼繚瀛樺瓧鍏哥殑鏁版嵁緇撴瀯閲囩敤涓媯電孩榛戞爲銆傚浜庢瘡涓猄tring瀛楁錛屾渶緇堜細鏈変笁涓緭鍑篠tream錛屽垎鍒槸StringOuptut(璁板綍瀛楀吀涓殑鍊?錛孡engthOutput(璁板綍姣忎釜瀛楀吀鍊肩殑闀垮害)錛孯owOutput(璁板綍瀛楁鍦ㄥ瓧鍏鎬腑鐨勪綅緗?銆?/p>
鎬濊?錛氫負浠涔堣鐢ㄧ孩榛戞爲錛?br />
鍥犱負綰㈤粦鏍戞棤璁烘槸鎻掑叆錛屽垹闄わ紝鏌ユ壘鐨勬ц兘閮芥瘮杈冨鉤鍧囷紝閮芥槸O(logN)錛岃屼笖鏄鉤琛℃煡鎵炬爲錛屾渶鍧忔儏鍐典篃涓嶄細閫鍖栨垚O(N)
鎬濊?錛氬叾瀹炰竴鑸瓨鍌ㄦ椂榪樹細浣跨敤LZO涔嬬被鐨勫帇緙╋紝瀹冧滑鏈韓灝辨槸涓縐嶅瓧鍏稿帇緙╋紝涓轟粈涔圤rc閲岄潰瑕佽嚜宸卞仛瀛楀吀鍘嬬緝錛?br />
鍥犱負LZO涔嬬被鐨勫帇緙╃獥鍙d竴鑸瘮杈冨皬錛圠ZO榛樿鏄?4KB錛夛紝鑰孫rc鐨勫瓧鍏稿帇緙╂槸浠ユ暣涓瓧孌典負鑼冨洿鏉ュ帇緙╃殑錛屽帇緙╃巼浼氭洿濂姐?br />
2錛孯un-Length緙栫爜錛?/strong>
瀵逛簬int,long,short綾誨瀷鐨勫瓧孌碉紝浣跨敤Run-Length緙栫爜銆傝Run-Length鑳藉瀵圭瓑宸暟鍒楋紙瀹屽叏鐩哥瓑涔熷睘浜庣瓑宸暟鍒楋級榪涜鍘嬬緝錛岃絳夊樊鏁板垪闇瑕佹弧瓚充互涓嬩袱涓潯浠訛細
1錛岃嚦灝戝寘鍚?涓厓绱?/p>
2錛屽樊鍊煎湪-128~127涔嬮棿錛堝洜涓哄樊鍊肩敤1Byte鏉ヨ〃紺猴級
瀵逛簬涓嶆弧瓚崇瓑宸暟鍒楃殑鏁板瓧錛孯un-Length緙栫爜涔熻兘瀛樺偍錛屼絾鏄病鏈夊帇緙╂晥鏋滐紝Run-Length鐨勫叿浣撳瓨鍌ㄥ涓嬶細
絎竴涓狟yte鏄疌ontrol Byte錛屽彇鍊煎湪-128~127涔嬮棿錛屽叾涓?1~-128浠h〃鍚庨潰瀛樺偍鐫1~128涓笉婊¤凍絳夊樊鏁板垪鐨勬暟瀛楋紝0~127浠h〃鍚庨潰瀛樺偍鐫3~130涓瓑宸暟鍒楃殑鏁板瓧錛?/p>
濡傛灉Control Byte>=0錛屽垯鍚庨潰璺熺潃涓涓狟yte瀛樺偍宸鹼紝鍚﹀垯涓嶅瓨鍌ㄨByte錛?/p>
濡傛灉Control Byte>=0錛屽垯鍚庨潰璺熺潃絳夊樊鏁板垪鐨勭涓涓暟錛屽惁鍒欒窡鐫-Control Byte涓暟瀛椼?/p>
渚嬪瓙錛?/p>
鍘熷鏁板瓧錛?2,12,12,12,12,10,7,13
緇忚繃Run-Length鐨勬暟瀛楋細2,0,12,-3,10,7,13
綰㈣壊浠h〃Control Byte錛岄粍鑹蹭唬琛ㄥ樊鍊鹼紝榛戣壊浠h〃鍏蜂綋鐨勬暟瀛椼?br />
3錛孊it緙栫爜錛?/strong>
瀵規墍鏈夌被鍨嬬殑瀛楁閮藉彲浠ラ噰鐢˙it緙栫爜鏉ヨ〃紺鴻鍊兼槸鍚︿負null銆傚湪鍐欎換浣曠被鍨嬪瓧孌典箣鍓嶏紝鍏堝垽鏂瀛楁鍊兼槸澶熶負null錛屽鏋滀負null鍒檅it鍊煎瓨涓?錛屽惁鍒欏瓨涓?錛屽浜庝負null鐨勫瓧孌靛湪瀹為檯緙栫爜鏃朵笉闇瑕佸瓨鍌ㄤ簡銆傜粡榪嘊it緙栫爜涔嬪悗錛屽彲浠ュ浜?涓猙it緇勬垚涓涓狟yte錛屽啀瀵瑰叾榪涜Run-Length緙栫爜銆?/p>
鍏跺疄闄や簡榪欎笁縐嶇紪鐮佹牸寮忎箣澶栵紝Orc瀵逛簬hive鐨勫鏉傜被鍨媋rray,map,list絳夛紝灝嗗叾闄嶇淮鎴愬熀鏈被鍨嬫潵瀛樺偍錛岃繖涓篃鏄煎緱鍊熼壌鐨勶紝濡傛灉鏈夌┖涔嬪悗浼氳繘琛屽垎鏋愩?/p>
鍏堜粙緇嶄笅Orc鐨勬枃浠舵牸寮忥紝鎴竴寮犲畼鏂圭殑鍥撅細
鍙互鐪嬪埌姣忎釜Orc鏂囦歡鐢?涓垨澶氫釜stripe緇勬垚錛屾瘡涓猻tripe250MB澶у皬錛岃繖涓猄tripe瀹為檯鐩稿綋浜庝箣鍓嶇殑rcfile閲岀殑RowGroup姒傚康錛屼笉榪囧ぇ灝忕敱4MB->250MB錛岃繖鏍峰簲璇ヨ兘鎻愬崌欏哄簭璇葷殑鍚炲悙鐜囥傛瘡涓猄tripe閲屾湁涓夐儴鍒嗙粍鎴愶紝鍒嗗埆鏄疘ndex Data,Row Data,Stripe Footer錛?/p>
1錛孖ndex Data錛氫竴涓交閲忕駭鐨刬ndex錛岄粯璁ゆ槸姣忛殧1W琛屽仛涓涓儲寮曘傝繖閲屽仛鐨勭儲寮曞簲璇ュ彧鏄褰曟煇琛岀殑鍚勫瓧孌靛湪Row Data涓殑offset錛屾嵁璇磋繕鍖呮嫭姣忎釜Column鐨刴ax鍜宮in鍊鹼紝鍏蜂綋娌$粏鐪嬩唬鐮併?/p>
2錛孯ow Data錛氬瓨鐨勬槸鍏蜂綋鐨勬暟鎹紝鍜孯Cfile涓鏍鳳紝鍏堝彇閮ㄥ垎琛岋紝鐒跺悗瀵硅繖浜涜鎸夊垪榪涜瀛樺偍銆備笌RCfile涓嶅悓鐨勫湴鏂瑰湪浜庢瘡涓垪榪涜浜嗙紪鐮侊紝鍒嗘垚澶氫釜Stream鏉ュ瓨鍌紝鍏蜂綋濡備綍緙栫爜鍦ㄤ笅涓綃囪В鏋愰噷浼氳銆?/p>
3錛孲tripe Footer錛氬瓨鐨勬槸鍚勪釜Stream鐨勭被鍨嬶紝闀垮害絳変俊鎭?/p>
姣忎釜鏂囦歡鏈変竴涓狥ile Footer錛岃繖閲岄潰瀛樼殑鏄瘡涓猄tripe鐨勮鏁幫紝姣忎釜Column鐨勬暟鎹被鍨嬩俊鎭瓑錛涙瘡涓枃浠剁殑灝鵑儴鏄竴涓狿ostScript錛岃繖閲岄潰璁板綍浜嗘暣涓枃浠剁殑鍘嬬緝綾誨瀷浠ュ強FileFooter鐨勯暱搴︿俊鎭瓑銆傚湪璇誨彇鏂囦歡鏃訛紝浼歴eek鍒版枃浠跺熬閮ㄨPostScript錛屼粠閲岄潰瑙f瀽鍒癋ile Footer闀垮害錛屽啀璇籉ileFooter錛屼粠閲岄潰瑙f瀽鍒板悇涓猄tripe淇℃伅錛屽啀璇誨悇涓猄tripe錛屽嵆浠庡悗寰鍓嶈銆?/p>
鎺ヤ笅鏉ョ湅涓婳Rcfile鐩稿浜嶳Cfile鍋氫簡鍝簺鏀硅繘錛屼粠Orc浣滆呯殑ppt閲屾埅浜嗗紶鍥撅紝鍒嗗埆瑙i噴涓嬪悇琛岋細
Hive type model:RCfile鍦ㄥ簳灞傚瓨鍌ㄦ椂涓嶄繚瀛樼被鍨嬶紝閮藉綋鍋欱yte嫻佹潵瀛樺偍
Separtor complex columns:Orc灝嗗鏉傜被鍨嬫媶寮瀛樺偍
Splits Found Quickly錛氫笉寰堢悊瑙?br />
Default Column group size錛氫笉鐢ㄨВ閲婁簡
Files per a bucket錛氫笉寰堢悊瑙?/p>
Store min錛宮ax錛宑ount錛宻um錛氬瓨浜嗚繖浜涗究浜庡揩閫熷湴skip鎺変竴涓猻tripe
Versioned metadata:涓嶅緢鐞嗚В
Run-Length Data-coding錛氭暣鏁扮被鍨嬪仛Run-Length鍙橀暱緙栫爜
Store Strings in dictionary錛歋tring綾誨瀷鍋氬瓧鍏哥紪鐮?/p>
Store Row Count錛氭瘡涓猄tripe浼氬瓨鍌ㄨ鏁?/p>
Skip Compressed blocks:鍙互鐩存帴skip鎺夊帇緙╄繃鐨刡lock
Store internal indexes:瀛樺偍浜嗕竴涓交閲忕駭鐨刬ndex
鏁翠釜Orc鐪嬩笅鏉ワ紝浠g爜鍐欑殑榪樻槸姣旇緝娓呮櫚鏄庝簡鐨勶紝鑰屼笖鎴戜滑涔熻繘琛屼簡嫻嬭瘯錛屽帇緙╂晥鏋滄瘮RCfile鎻愬崌浜嗕笉灝戯紝鏈夊叴瓚g殑鏈嬪弸鍙互鏉ョ湅涓嬶紝涔嬪悗浼氬啓絎簩綃囪В鏋愶紝涓昏鏄Orc鐢ㄥ埌鐨勫嚑縐嶇紪鐮佹牸寮忋?br />
鍦ㄧ綉涓婂ぇ澶氭暟璧勬枡涓紝閮芥槸璇?/span>combiner鍦?/span>map绔繍琛岋紝鍙戠敓鍦?/span>map杈撳嚭鏁版嵁涔嬪悗錛岀粡榪?/span>combiner鍐嶄紶閫掔粰reducer銆備絾鏄箣鍓嶅湪宸ヤ綔涓嚭鐜扮殑涓涓棶棰樺鑷存垜鍙戠幇鍘熸潵combiner灞呯劧涔熶細鍦?/span>reducer绔繍琛岋紝騫朵笖浼氬嬈¤繍琛屻?/span>
鍦ㄧ綉涓婃煡浜嗕箣鍚庡彂鐜幫紝榪欐槸hadoop-0.18鐗堟湰寮曞叆鐨勬柊feature錛?/span>
Changed policy for running combiner. The combiner may be run multiple times as the map's output is sorted and merged. Additionally, it may be run on the reduce side as data is merged. The old semantics are available in Hadoop 0.18 if the user calls: job.setCombineOnlyOnce(true)銆?/span>
瀹為檯涓?/span>combiner浼氬湪mapper绔拰reducer绔垎鍒繍榪愯錛岀湅浜嗕笅浠g爜錛屽彂鐢?/span>combine鐨勬椂鏈哄湪浠ヤ笅錛?/span>
1錛?/span> 鍦?/span>mapper绔殑spill闃舵錛屽湪緙撳瓨涓殑璁板綍瓚呰繃闃堝兼椂浼氳繘琛?/span>combine
if (spstart != spindex) {
…
combineAndSpill(kvIter, combineInputCounter);
}
2錛?/span> 鍦?/span>mapper绔殑merge闃舵錛岃繘琛?/span>merge鐨?/span>spill鏂囦歡鏁扮洰>=3鏃朵細榪涜combine
if (null == combinerClass || numSpills < minSpillsForCombine) {
Merger.writeFile(kvIter, writer, reporter);
} else {
combineCollector.setWriter(writer);
combineAndSpill(kvIter, combineInputCounter);
}
3錛?/span> 鍦?/span>reducer绔紝涓瀹氫細榪涜combine