青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品

qiezi的學習園地

AS/C/C++/D/Java/JS/Python/Ruby

  C++博客 :: 首頁 :: 新隨筆 ::  ::  :: 管理 ::
周末抽空做了點小測試,根據http://blog.vckbase.com/jzhang/archive/2006/03/28/18807.html中m網友修改的算法,python版本中讀取所有行以后就做一個排序,再去除重復項。這個算法在我的機器上執行時間是1735ms左右,屬于python版本中最快的一個。

D版本暫還沒想到有更優化的做法,D在處理以char[]作key的關聯數組時,判斷方法是先判斷hash,如果hash相等,則繼續做字符串判斷。它執行時間是1120ms左右。

以D版本為基礎,自己寫了一個C++的Email類:

class?Email
{
private:
????????
string?mail;
????????size_t?hash;
????????friend?
bool?operator?<?(const?Email&?lhs,?const?Email&?rhs);
public:
????????Email?(
const?char*?mail_)
????????????????:?mail(mail_),?hash(my_hash(mail_))
????????{
????????}
};

bool?operator?<?(const?Email&?lhs,?const?Email&?rhs)
{
????????
if?(lhs.hash?==?rhs.hash)
????????????????
return?lhs.mail?<?rhs.mail;
????????
return?lhs.hash?<?rhs.hash;
}

把它插入set并判斷是否有重復。

這個程序由于string的大量拷貝,以及大量內存分配,執行時間相當長,在我的機器上是5s左右。D和python版本由于對象拷貝成本較低,加上都有內存分配策略,自然有一些優勢。

退而求其次,既然hash沖突的幾率較低,試試只保存hash:

class?Email
{
private:
????????size_t?hash;
????????friend?
bool?operator?<?(const?Email&?lhs,?const?Email&?rhs);
public:
????????Email?(
const?char*?mail_)
????????????????:?hash(my_hash(mail_))
????????{
????????}
};

bool?operator?<?(const?Email&?lhs,?const?Email&?rhs)
{
????????
return?lhs.hash?<?rhs.hash;
}

這次測試就比較快了,耗時僅1020ms左右,比D版本還要快,當然它不是完善的版本。

考慮到構造成本,于是改為只用一個set<int>來保存hash值再測試,這次耗時是930ms。

實際上可以做一個改進的C++版本,一次性讀入文件的全部內容到一個大緩沖區,把所有的\n字符修改為\0,用一個動態數組保存緩沖區的所有字符串指針,hash值也須計算并保存到數組。再用D的索引方式,先hash比較,再字符串比較,效率應該也不低。

實現了一個:
#include?<iostream>
#include?
<string>
#include?
<set>
#include?
<fstream>
#include?
<iterator>
#include?
<sys/time.h>
using?namespace?std;


size_t?my_hash?(
const?char*?str)
{
????????size_t?ret?
=?0;
????????
while?(*str)
????????????????ret?
=?11?*?ret?+?*str++;
????????
return?ret;
}

class?Email
{
private:
????????size_t?hash;
????????
const?char*?mail;
????????friend?
bool?operator?<?(const?Email&?lhs,?const?Email&?rhs);
public:
????????Email?(
const?char*?mail_)
????????????????:?hash(my_hash(mail_)),?mail(mail_)
????????{
????????}
};

bool?operator?<?(const?Email&?lhs,?const?Email&?rhs)
{
????????
if?(lhs.hash?==?rhs.hash)
????????????????
return?strcmp(lhs.mail,?rhs.mail)?<?0;
????????
return?lhs.hash?<?rhs.hash;
}

int?main(int?argc,?char**?argv)
{
????????
if?(argc?<?3)
????????{
????????????????cout?
<<?"Wrong?arguments"?<<?endl;
????????????????
return?1;
????????}

????????FILE
*?fin?=?fopen(argv[1],?"r");
????????
if?(!fin)
????????{
????????????????cout?
<<?"Invalid?input?file"?<<?endl;
????????????????
return?2;
????????}
????????FILE
*?fout?=?fopen(argv[2],?"w");
????????
if?(!fout)
????????{
????????????????fclose(fin);
????????????????cout?
<<?"Invalid?output?file"?<<?endl;
????????????????
return?3;
????????}

????????timeval?start,?end;

????????
const?int?BUF_SIZE?=?20?*?1024?*?1024;
????????
char*?buffer?=?new?char[BUF_SIZE];
????????memset(buffer,?
0,?BUF_SIZE);

????????gettimeofday(
&start,?0);
????????
set<Email>?emails;

????????size_t?read?
=?fread?(buffer,?BUF_SIZE,?1,?fin);
????????
char*?begin?=?buffer;
????????
char*?current?=?buffer;

????????
while?(*current?!=?'\0')
????????{
????????????????
if?(*current?==?'\n')
????????????????{
????????????????????????
*current?=?'\0';
????????????????????????
if?(emails.insert(begin).second){
????????????????????????????????fputs(begin,?fout);
????????????????????????????????fwrite(
"\n",?1,?1,?fout);
????????????????????????}
????????????????????????begin?
=?current?+?1;
????????????????}
????????????????
++?current;
????????}

??????? fclose(fout);
??????? fclose(fin);

????????gettimeofday(&end,?0);

????????printf(
"Time?used:?%d?ms\n",?((end.tv_sec?-?start.tv_sec)?*?1000?+?(end.tv_usec?-?start.tv_usec)?/?1000));

??????? delete[] buffer;
????????
return?0;
}

memset不是必須的,不過我不知道如何獲取讀入的大小。fread讀取后,如果讀到EOF,則返回值為0。所以我這里用memset先初始化內存,但不把它計入耗時。new操作也沒計入,因為其它語言比如python、D在啟動時都由運行時做了類似工作。

這個程序在我的機器上耗時為1350ms左右。我想可慢在set上,對象拷貝?內存分配?

做了幾個優化版本,沒多大提高。


重新測試了下:
A、python(m網友版本):
lijie t # python test.py
1689.0411377
lijie t # python test.py
1711.40599251
lijie t # python test.py
1699.63312149
lijie t # python test.py
1712.00013161
lijie t # python test.py
1713.8838768

B、D版本:
lijie t # ./testd email.txt email-new.txt
1091
lijie t # ./testd email.txt email-new.txt
1070
lijie t # ./testd email.txt email-new.txt
1062
lijie t # ./testd email.txt email-new.txt
1062
lijie t # ./testd email.txt email-new.txt
1096

C、C++只比較hash,set<Email>版本:
lijie t # ./test3 email.txt email-new.txt
Time used: 981 ms
lijie t # ./test3 email.txt email-new.txt
Time used: 1000 ms
lijie t # ./test3 email.txt email-new.txt
Time used: 980 ms
lijie t # ./test3 email.txt email-new.txt
Time used: 986 ms
lijie t # ./test3 email.txt email-new.txt
Time used: 987 ms

D、C++只比較hash,set<int>版本:
lijie t # ./test4 email.txt email-new.txt
Time used: 951 ms
lijie t # ./test4 email.txt email-new.txt
Time used: 953 ms
lijie t # ./test4 email.txt email-new.txt
Time used: 947 ms
lijie t # ./test4 email.txt email-new.txt
Time used: 950 ms
lijie t # ./test4 email.txt email-new.txt
Time used: 962 ms

E、C++大緩沖區,比較hash和字符串,set<Email>版本:
lijie t # ./test5 email.txt email-new.txt
Time used: 1375 ms
lijie t # ./test5 email.txt email-new.txt
Time used: 1359 ms
lijie t # ./test5 email.txt email-new.txt
Time used: 1369 ms
lijie t # ./test5 email.txt email-new.txt
Time used: 1378 ms
lijie t # ./test5 email.txt email-new.txt
Time used: 1396 ms

F、C++大緩沖區,比較字符串版本:
lijie t # ./test6 email.txt email-new.txt
Time used: 1168 ms
lijie t # ./test6 email.txt email-new.txt
Time used: 1169 ms
lijie t # ./test6 email.txt email-new.txt
Time used: 1171 ms
lijie t # ./test6 email.txt email-new.txt
Time used: 1179 ms
lijie t # ./test6 email.txt email-new.txt
Time used: 1169 ms

從C、E和F來看,對象拷貝成本是比較高的,E版本僅僅比C版本多了個const char*成員變量,hash值比較散,很少會真的執行到strcmp。保持E版本對象結構不變,把operator <里面的實現改為C版本,測試結果如下:
lijie t # ./test5 email.txt email-new.txt
Time used: 1355 ms
lijie t # ./test5 email.txt email-new.txt
Time used: 1360 ms
lijie t # ./test5 email.txt email-new.txt
Time used: 1348 ms
lijie t # ./test5 email.txt email-new.txt
Time used: 1353 ms
lijie t # ./test5 email.txt email-new.txt
Time used: 1379 ms

效率只提高了一點點,這個版本僅僅比C版本多了個成員變量拷貝,竟然慢了這么多。說明Email對象的2個成員變量拷貝成本的確很高。

F版本相比之下反而效率很不錯,主要原因是email數據不夠復雜,僅通過前幾位就可以比較出結果。如果每行數據比較長,而且很多行要到后幾個字符才能比較出來,肯定就不那么快了。

hash值的計算雖然執行了一系列乘法,不過還是相當迅速。

D語言版本執行了hash值和字符串比較,是比較完善的,效率很不錯。C++相應版本看來要提高set的效率才能達到。


jzhang的第一個python版本在我的機器上執行如下:
lijie t # python test2.py
3122.9569912 ms
lijie t # python test2.py
3209.42997932 ms
lijie t # python test2.py
3141.47305489 ms
lijie t # python test2.py
3129.57286835 ms
lijie t # python test2.py
3196.03514671 ms

我做了點修改,執行速度提高了一些:
#remove?duplicated?email?address?from?file
import?datetime
from?time?import?time
if?__name__?==?"__main__":
?start?
=?time()
?hashtable?
=?{}
?f?
=?file("email.txt","r")
?f2?
=?file("email_new.txt","w")
?
for?line?in?f.xreadlines():
??
if?not?hashtable.has_key(line):
???hashtable[line]?
=?1
???f2.write(line)
?f.close()
?f2.close()
?print?(time()?
-?start)?*?1000,?"ms"
在我的機器上執行結果如下:
lijie t # python test1.py
2239.22801018 ms
lijie t # python test1.py
2301.00703239 ms
lijie t # python test1.py
2282.06086159 ms
lijie t # python test1.py
2296.57006264 ms
lijie t # python test1.py
2281.25810623 ms

不過還是沒有m網友的效率高。


在F版本的基礎上,借鑒m網友的做法,實現一個G版本:

G、排序并去除重復元素,比較hash和字符串版本:
#include?<iostream>
#include?
<string>
#include?
<fstream>
#include?
<iterator>
#include?
<sys/time.h>
#include?
<vector>
using?namespace?std;


size_t?my_hash?(
const?char*?str)
{
????????size_t?ret?
=?0;
????????
while?(*str)
????????????????ret?
=?11?*?ret?+?*str++;
????????
return?ret;
}

class?Email
{
private:
????????size_t?hash;
????????
const?char*?mail;
????????friend?
bool?operator?<?(const?Email&?lhs,?const?Email&?rhs);
public:
????????Email?(
const?char*?mail_)
????????????????:?hash(my_hash(mail_)),?mail(mail_)
????????{
????????}

????????
bool?operator?==?(const?Email&?rhs)
????????{
????????????????
if?(hash?==?rhs.hash)
????????????????????????
return?strcmp(mail,?rhs.mail)?==?0;
????????????????
return?false;
????????}

????????
const?char*?getEmail()const
????????{
????????????????
return?mail;
????????}
};

bool?operator?<?(const?Email&?lhs,?const?Email&?rhs)
{
????????
if?(lhs.hash?==?rhs.hash)
????????????????
return?strcmp(lhs.mail,?rhs.mail)?<?0;
????????
return?lhs.hash?<?rhs.hash;
}

int?main(int?argc,?char**?argv)
{
????????
if?(argc?<?3)
????????{
????????????????cout?
<<?"Wrong?arguments"?<<?endl;
????????????????
return?1;
????????}

????????FILE
*?fin?=?fopen(argv[1],?"r");
????????
if?(!fin)
????????{
????????????????cout?
<<?"Invalid?input?file"?<<?endl;
????????????????
return?2;
????????}
????????FILE
*?fout?=?fopen(argv[2],?"w");
????????
if?(!fout)
????????{
????????????????fclose(fin);
????????????????cout?
<<?"Invalid?output?file"?<<?endl;
????????????????
return?3;
????????}

????????timeval?start,?end;

????????
const?int?BUF_SIZE?=?20?*?1024?*?1024;
????????
char*?buffer?=?new?char[BUF_SIZE];
????????memset(buffer,?
0,?BUF_SIZE);

????????gettimeofday(
&start,?0);
????????vector
<Email>?emails;

????????size_t?read?
=?fread?(buffer,?BUF_SIZE,?1,?fin);
????????
char*?begin?=?buffer;
????????
char*?current?=?buffer;

????????
while?(*current?!=?'\0')
????????{
????????????????
if?(*current?==?'\n')
????????????????{
????????????????????????
*current?=?'\0';
????????????????????????emails.push_back(begin);
????????????????????????begin?
=?current?+?1;
????????????????}
????????????????
++?current;
????????}
????????fclose(fin);
????????sort(emails.begin(),?emails.end());
????????emails.erase?(unique(?emails.begin(),?emails.end()?),?emails.end());

????????
for?(vector<Email>::const_iterator?iter?=?emails.begin();
?????????????iter?
!=?emails.end();
?????????????iter?
++)
????????{
????????????????fputs((
*iter).getEmail(),?fout);
????????????????fwrite(
"\n",?1,?1,?fout);
????????}

????????fclose(fout);

????????gettimeofday(
&end,?0);

????????printf(
"Time?used:?%d?ms\n",?((end.tv_sec?-?start.tv_sec)?*?1000?+?(end.tv_usec?-?start.tv_usec)?/?1000));

????????delete[]?buffer;

????????
return?0;
}

在我的機器上執行如下:
lijie t # ./test7 email.txt email-new.txt
Time used: 676 ms
lijie t # ./test7 email.txt email-new.txt
Time used: 675 ms
lijie t # ./test7 email.txt email-new.txt
Time used: 671 ms
lijie t # ./test7 email.txt email-new.txt
Time used: 669 ms
lijie t # ./test7 email.txt email-new.txt
Time used: 673 ms

比F版本快了2倍,也快過了其它所有版本。不過由于數據是vector保存的,在數據大量重復的情況下,性能可能會有較大的降低。

把operator<和operator==的實現改為strcmp比較,執行結果如下:
lijie t # ./test8 email.txt email-new.txt
Time used: 1275 ms
lijie t # ./test8 email.txt email-new.txt
Time used: 1267 ms
lijie t # ./test8 email.txt email-new.txt
Time used: 1297 ms
lijie t # ./test8 email.txt email-new.txt
Time used: 1296 ms
lijie t # ./test8 email.txt email-new.txt
Time used: 1271 ms


修改了下,增加了計時,修正了fread使用錯誤。

#include?<iostream>
#include?
<string>
#include?
<fstream>
#include?
<iterator>
#include?
<vector>
using?namespace?std;

#ifdef?_WIN32
#?include?
<windows.h>
#else?//?_WIN32
#?include?
<sys/time.h>
#endif?//?_WIN32


size_t?my_hash?(
const?char*?str)
{
????????size_t?ret?
=?0;
????????
while?(*str)
????????????????ret?
=?11?*?ret?+?*str++;
????????
return?ret;
}

class?Email
{
private:
????????size_t?hash;
????????
const?char*?mail;
????????friend?
bool?operator?<?(const?Email&?lhs,?const?Email&?rhs);
public:
????????Email?(
const?char*?mail_)
????????????????:?hash(my_hash(mail_)),?mail(mail_)
????????{
????????}

????????
bool?operator?==?(const?Email&?rhs)
????????{
????????????????
if?(hash?==?rhs.hash)
????????????????????????
return?strcmp(mail,?rhs.mail)?==?0;
????????????????
return?false;
????????}

????????
const?char*?getEmail()const
????????{
????????????????
return?mail;
????????}
};

bool?operator?<?(const?Email&?lhs,?const?Email&?rhs)
{
????????
if?(lhs.hash?==?rhs.hash)
????????????????
return?strcmp(lhs.mail,?rhs.mail)?<?0;
????????
return?lhs.hash?<?rhs.hash;
}

#ifndef?_WIN32
class?Timer
{
????????timeval?begin,?end;
public:
????????
void?start?()?{gettimeofday(&begin,?0);}
????????
void?stop?()?{gettimeofday(&end,?0);}
????????size_t?milliseconds?()?
const?{
????????????????
return?(end.tv_sec?-?begin.tv_sec)?*?1000?+?(end.tv_usec?-?begin.tv_usec)?/?1000;
????????}
};
#else?//?_WIN32
class?Timer
{
????????DWORD?begin,?end;
public:
????????
void?start?()?{begin?=?GetTickCount();}
????????
void?stop?()?{end?=?GetTickCount();}
????????size_t?milliseconds?()?
const?{
????????????????
return?end?-?begin;
????????}
};
#endif?//?_WIN32


int?main(int?argc,?char**?argv)
{
????????
if?(argc?<?3)
????????{
????????????????cout?
<<?"Wrong?arguments"?<<?endl;
????????????????
return?1;
????????}

????????
for?(int?i=0;?i<10;?++i)?{

???????FILE
*?fin?=?fopen(argv[1],?"r");
????????
if?(!fin)
????????{
????????????????cout?
<<?"Invalid?input?file"?<<?endl;
????????????????
return?2;
????????}
????????FILE
*?fout?=?fopen(argv[2],?"w");
????????
if?(!fout)
????????{
????????????????fclose(fin);
????????????????cout?
<<?"Invalid?output?file"?<<?endl;
????????????????
return?3;
????????}

????????Timer?total,?part;
????????total.start();
????????part.start();

????????
const?int?BUF_SIZE?=?20?*?1024?*?1024;

????????
char*?buffer?=?new?char[BUF_SIZE];
????????
if?(!buffer){
????????????????
//?cout?<<?"Alloc?buffer?failed"?<<?endl;
????????????????return?4;
????????}

????????part.stop();
????????
//?cout?<<?"Alloc?buffer,?"?<<?part.milliseconds()?<<?"?ms?used."?<<?endl;
????????part.start();

????????size_t?read?
=?fread?(buffer,?1,?BUF_SIZE,?fin);
????????fclose(fin);
????????buffer[read]?
=?'\0';
????????part.stop();
????????
//?cout?<<?"Read?file,?"?<<?part.milliseconds()?<<?"?ms?used."?<<?endl;
????????part.start();

????????vector
<Email>?emails;

????????
char*?begin?=?buffer;
????????
char*?current?=?buffer;

????????
while?(*current?!=?'\0')
????????{
????????????????
if?(*current?==?'\n')
????????????????{
????????????????????????
*current?=?'\0';
????????????????????????emails.push_back(begin);
????????????????????????begin?
=?current?+?1;
????????????????}
????????????????
++?current;
????????}

????????part.stop();
????????
//?cout?<<?"Put?emails?into?vector,?"?<<?part.milliseconds()?<<?"?ms?used."?<<?endl;
????????part.start();

????????sort(emails.begin(),?emails.end());
????????part.stop();
????????
//?cout?<<?"Sort?emails,?"?<<?part.milliseconds()?<<?"?ms?used."?<<?endl;
????????part.start();

????????emails.erase?(unique(?emails.begin(),?emails.end()?),?emails.end());
????????part.stop();
????????
//?cout?<<?"Unique?emails,?"?<<?part.milliseconds()?<<?"?ms?used."?<<?endl;
????????part.start();

????????
for?(vector<Email>::const_iterator?iter?=?emails.begin();
?????????????iter?
!=?emails.end();
?????????????iter?
++)
????????{
????????????????fputs((
*iter).getEmail(),?fout);
????????????????fwrite(
"\n",?1,?1,?fout);
????????}

????????fclose(fout);

????????part.stop();
????????
//?cout?<<?"Write?emails?into?new?file,?"?<<?part.milliseconds()?<<?"?ms?used."?<<?endl;

????????total.stop();

????????cout?
<<?"Total?used:?"?<<?total.milliseconds()?<<?"?ms."?<<?endl;

????????delete[]?buffer;
????????}

????????
return?0;
}

使用“-O3 -fomit-frame-pointer -funroll-loops -mtune=pentium4”選項編譯,耗時從680ms減少到620ms


優化文件讀寫版:
#include?<iostream>
#include?
<string>
#include?
<fstream>
#include?
<iterator>
#include?
<vector>
using?namespace?std;

//?config
#define?USE_CACHE
//#define?PROFILE
//?end?config


#ifdef?_WIN32
#?include?
<windows.h>
#else?//?_WIN32
#?include?
<sys/time.h>
#endif?//?_WIN32

#ifndef?_WIN32
class?Timer
{
????timeval?begin,?end;
public:
????
void?start?()?{gettimeofday(&begin,?0);}
????
void?stop?()?{gettimeofday(&end,?0);}
????size_t?milliseconds?()?
const?{
????????
return?(end.tv_sec?-?begin.tv_sec)?*?1000?+?(end.tv_usec?-?begin.tv_usec)?/?1000;
????}
};
#else?//?_WIN32
class?Timer
{
????DWORD?begin,?end;
public:
????
void?start?()?{begin?=?GetTickCount();}
????
void?stop?()?{end?=?GetTickCount();}
????size_t?milliseconds?()?
const?{
????????
return?end?-?begin;
????}
};
#endif?//?_WIN32


#ifdef?PROFILE
#?define?PROFILE_OUTPUT(timer,info)?\
????timer.stop();?\
????cout?
<<?info?<<?":?"?<<?timer.milliseconds()?<<?"?ms?used."?<<?endl;?\
????timer.start()
#else?//?PROFILE
#?define?PROFILE_OUTPUT(timer,info)
#endif?//?PROFILE



size_t?my_hash?(
const?char*?str)
{
????size_t?ret?
=?0;
????
while?(*str)
????????????ret?
=?11?*?ret?+?*str++;
????
return?ret;
}

class?Email
{
private:
????size_t?hash;
????
const?char*?mail;
????friend?
bool?operator?<?(const?Email&?lhs,?const?Email&?rhs);
public:
????Email?(
const?char*?mail_)
????????:?hash(my_hash(mail_)),?mail(mail_)
????{
????}

????
bool?operator?==?(const?Email&?rhs)
????{
????????
if?(hash?==?rhs.hash)
????????????
return?strcmp(mail,?rhs.mail)?==?0;
????????
return?false;
????}

????
const?char*?getEmail()?const
????{
????????
return?mail;
????}

????size_t?getLength()?
const
????{
????????
return?strlen(mail);
????}
};

bool?operator?<?(const?Email&?lhs,?const?Email&?rhs)
{
????
if?(lhs.hash?==?rhs.hash)
????????
return?strcmp(lhs.mail,?rhs.mail)?<?0;
????
return?lhs.hash?<?rhs.hash;
}



#ifdef?USE_CACHE
class?OfstreamBuffer
{
????ofstream
&?ofs;
????size_t?buf_size;
????size_t?offset;
????
char*?buffer;

public:
????OfstreamBuffer?(ofstream
&?ofs_,?size_t?buf_size_)
????????:?ofs(ofs_),?buf_size(buf_size_),?offset(
0)
????{
????????buffer?
=?new?char[buf_size_];
????}

????
~OfstreamBuffer?()
????{
????????flush();
????????delete[]?buffer;
????}

????
void?write?(const?char*?ptr,?size_t?size)
????{
????????
while?(size?>?0)
????????{
????????????size_t?copy_size?
=?buf_size?-?offset;
????????????
if?(copy_size?>?size)
????????????????copy_size?
=?size;
????????????memcpy?(buffer?
+?offset,?ptr,?copy_size);
????????????offset?
+=?copy_size;
????????????ptr?
+=?copy_size;
????????????size?
-=?copy_size;

????????????
if?(offset?==?buf_size)
????????????????flush?();
????????}
????}

????
void?flush?()
????{
????????
if?(offset?>?0)
????????{
????????????ofs.write(buffer,?offset);
????????????offset?
=?0;
????????}
????????ofs.flush();
????}
};
#else//?USE_CACHE
class?OfstreamBuffer
{
????ofstream
&?ofs;

public:
????OfstreamBuffer?(ofstream
&?ofs_,?size_t?buf_size_)
????????:?ofs(ofs_)
????{
????}

????
void?write?(const?char*?ptr,?size_t?size)
????{
????????ofs.write(ptr,?size);
????}

????
void?flush?()
????{
????????ofs.flush();
????}
};
#endif?//?USE_CACHE




int?main(int?argc,?char**?argv)
{
????
if?(argc?<?3)
????{
????????cout?
<<?"Wrong?arguments"?<<?endl;
????????
return?1;
????}

????
for?(int?i=0;?i<1;?++i)?{

????????ifstream?ifs(argv[
1],?ios::binary);
????????
if?(!ifs)
????????{
????????????cout?
<<?"Invalid?input?file"?<<?endl;
????????????
return?2;
????????}

????????ofstream?ofs(argv[
2],?ios::binary);
????????
if?(!ofs)
????????{
????????????cout?
<<?"Invalid?output?file"?<<?endl;
????????????
return?3;
????????}

????????Timer?total,?part;
????????total.start();
????????part.start();

????????ifs.seekg(
0,?ios_base::end);
????????size_t?file_size?
=?(size_t)ifs.tellg()?+?1;
????????cout?
<<?"file?size:?"?<<?file_size?<<?endl;
????????ifs.seekg(
0,?ios_base::beg);

????????
char*?buffer?=?new?char[file_size];
????????
if?(!buffer){
????????????cout?
<<?"Alloc?buffer?failed"?<<?endl;
????????????
return?4;
????????}

????????PROFILE_OUTPUT(part,?
"Alloc?buffer");

????????ifs.read(buffer,?file_size);
????????buffer[file_size]?
=?'\0';

????????PROFILE_OUTPUT(part,?
"Read?file");

????????vector
<Email>?emails;
????????emails.reserve(
1000000);

????????
char*?begin?=?buffer;
????????
char*?current?=?buffer;

????????
while?(*current?!=?'\0')
????????{
????????????
if?(*current?==?'\n')
????????????{
????????????????
*current?=?'\0';
????????????????emails.push_back(begin);
????????????????begin?
=?current?+?1;
????????????}
????????????
++?current;
????????}

????????PROFILE_OUTPUT(part,?
"Put?emails?into?vector");

????????sort(emails.begin(),?emails.end());

????????PROFILE_OUTPUT(part,?
"Sort?emails");

????????emails.erase?(unique(?emails.begin(),?emails.end()?),?emails.end());

????????PROFILE_OUTPUT(part,?
"Unique?emails");

????????OfstreamBuffer?ofsBuffer?(ofs,?
4?*?1024);

????????
for?(vector<Email>::const_iterator?iter?=?emails.begin();
?????????????iter?
!=?emails.end();
?????????????iter?
++?)
????????{
????????????ofsBuffer.write(iter
->getEmail(),?iter->getLength());
????????????ofsBuffer.write(
"\n",?1);
????????}
????????ofsBuffer.flush();

????????PROFILE_OUTPUT(part,?
"Write?emails?into?new?file");

????????total.stop();
????????cout?
<<?"Total?used:?"?<<?total.milliseconds()?<<?"?ms."?<<?endl;


????????delete[]?buffer;
????}

????
return?0;
}
這個版本在windows上用dev-cpp所帶的gcc 3.4.2來編譯,最好成績是609ms。在cygwin里可以達到650ms。
posted on 2006-04-03 11:00 qiezi 閱讀(2811) 評論(27)  編輯 收藏 引用 所屬分類: C++雜談
青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品
  • <ins id="pjuwb"></ins>
    <blockquote id="pjuwb"><pre id="pjuwb"></pre></blockquote>
    <noscript id="pjuwb"></noscript>
          <sup id="pjuwb"><pre id="pjuwb"></pre></sup>
            <dd id="pjuwb"></dd>
            <abbr id="pjuwb"></abbr>
            麻豆精品视频在线观看| 亚洲精品在线观| 亚洲精品一区二区三区蜜桃久 | 欧美不卡福利| 久久成人18免费观看| 欧美日韩亚洲激情| 亚洲国产毛片完整版 | 另类激情亚洲| 另类国产ts人妖高潮视频| 国产精品久久久久永久免费观看| 亚洲福利视频二区| 在线精品高清中文字幕| 欧美在线一级va免费观看| 午夜精品久久久久影视 | 嫩草国产精品入口| 久久网站热最新地址| 国产欧美一区二区精品秋霞影院 | 久久精品国产2020观看福利| 午夜精品www| 国产精品swag| 一区二区不卡在线视频 午夜欧美不卡在 | 午夜精品www| 香蕉久久久久久久av网站| 欧美视频在线观看免费| 亚洲日本黄色| 亚洲精选久久| 欧美精品一线| 日韩视频永久免费| 99日韩精品| 欧美三级午夜理伦三级中文幕 | 欧美在线观看一区| 久久精品在线观看| 国际精品欧美精品| 久久青草久久| 亚洲高清免费| 中文久久精品| 国产精品视频免费观看www| 99这里只有久久精品视频| 一区二区日韩精品| 国产精品久久波多野结衣| 亚洲中字在线| 久久色在线观看| 亚洲国产另类久久久精品极度| 麻豆国产精品777777在线| 亚洲国产乱码最新视频| 亚洲图片在线| 国产农村妇女毛片精品久久莱园子 | 欧美成人午夜影院| 亚洲精品日韩在线观看| 欧美私人网站| 久久超碰97人人做人人爱| 亚洲第一精品影视| 亚洲在线播放| 在线免费观看一区二区三区| 欧美激情欧美狂野欧美精品| 亚洲一二三级电影| 狼人社综合社区| 一本久久青青| 国产亚洲一级高清| 欧美精品色一区二区三区| 亚洲欧美成人精品| 欧美黄色一区| 午夜综合激情| 亚洲人成绝费网站色www| 国产精品久久久久毛片软件 | 午夜精品一区二区在线观看 | 亚洲精品乱码久久久久久蜜桃麻豆 | 亚洲永久网站| 亚洲国产视频直播| 国产女主播一区二区三区| 麻豆精品国产91久久久久久| 亚洲婷婷综合久久一本伊一区| 狂野欧美激情性xxxx欧美| 亚洲一区日韩在线| 亚洲电影毛片| 国产精品专区第二| 欧美精品久久久久久久久老牛影院| 亚洲你懂的在线视频| 亚洲区免费影片| 免费亚洲电影| 久久国产加勒比精品无码| 9久re热视频在线精品| 激情五月综合色婷婷一区二区| 国产精品豆花视频| 欧美激情精品久久久久久| 欧美在线观看视频一区二区三区| 日韩一级精品视频在线观看| 欧美a级大片| 久久亚洲图片| 欧美在线视频二区| 亚洲尤物精选| 在线综合视频| 99re亚洲国产精品| 亚洲黄色尤物视频| 亚洲高清久久久| 国产一区二区三区av电影 | 国产在线视频欧美| 国产精品午夜电影| 国产精品成人va在线观看| 欧美日韩免费一区二区三区| 欧美风情在线观看| 欧美福利一区二区三区| 猫咪成人在线观看| 久久先锋资源| 美女精品网站| 免费高清在线视频一区·| 久久亚洲不卡| 久久综合综合久久综合| 久久一区精品| 欧美肥婆在线| 欧美激情五月| 欧美日韩系列| 国产精品久久夜| 国产日韩一区二区三区在线播放| 国产日韩精品一区二区| 国产欧美一级| 在线观看不卡| 亚洲靠逼com| 在线一区免费观看| 亚洲一区免费看| 欧美一区二区在线免费播放| 欧美综合国产精品久久丁香| 久久久国产一区二区| 久久一日本道色综合久久| 欧美岛国在线观看| 亚洲精品美女91| 亚洲永久在线观看| 久久精品国产亚洲aⅴ| 欧美~级网站不卡| 欧美午夜国产| 国内精品久久久久影院色| 欲色影视综合吧| 一区二区三区四区五区视频| 亚洲欧美日韩综合| 久久男人资源视频| 亚洲国产一区在线| 亚洲一二三级电影| 久久偷看各类wc女厕嘘嘘偷窃| 欧美老女人xx| 国产日产欧美精品| 亚洲欧洲精品一区| 亚洲欧美中文另类| 蜜桃av综合| 一本色道久久综合亚洲精品小说| 欧美一区二区三区喷汁尤物| 欧美激情va永久在线播放| 国产精品日韩欧美一区二区三区| 韩日精品视频一区| 在线视频日韩| 欧美成人a视频| 制服诱惑一区二区| 久热精品视频在线| 国产精品三上| 亚洲乱码国产乱码精品精| 欧美一区深夜视频| 亚洲人成网站精品片在线观看| 亚洲免费视频成人| 欧美成人午夜免费视在线看片| 国产乱子伦一区二区三区国色天香 | 久久精品综合网| 亚洲精品一区二区三区在线观看 | 久久精品综合一区| 99国产精品私拍| 男女精品网站| 狠狠色丁香婷婷综合影院| 亚洲欧美精品中文字幕在线| 欧美激情精品久久久久久蜜臀| 亚洲欧美日韩国产中文在线| 欧美精品在线一区| 亚洲国产成人高清精品| 久久久99国产精品免费| 亚洲天堂av图片| 欧美美女日韩| 亚洲精品免费网站| 美女91精品| 久久精品国产综合精品| 亚洲国产精品久久91精品| 新67194成人永久网站| 欧美午夜无遮挡| 国内精品视频在线播放| 国产一区二区三区网站| 亚洲一区二区在线免费观看| 亚洲人成网站影音先锋播放| 久久亚洲精品网站| 影音先锋久久精品| 久久夜色精品| 久久精品国产99国产精品| 国产一本一道久久香蕉| 欧美资源在线| 午夜伦欧美伦电影理论片| 国产精品嫩草影院一区二区| 亚洲欧美日韩中文在线制服| 一区二区三区欧美成人| 国产精品白丝黑袜喷水久久久| 中文精品在线| 亚洲一区在线直播| 国产欧美高清| 久久亚洲色图| 欧美二区视频| 亚洲图片自拍偷拍|