ivy-jie

progress ...

C++博客

管理

9 Posts :: 41 Stories :: 6 Comments :: 0 Trackbacks

常用鏈接

留言簿(1)

隨筆分類(4)

隨筆檔案(9)

文章分類(42)

文章檔案(41)

搜索

閱讀排行榜

評論排行榜

低頻詞的過濾

題目描述：請編寫程序，從包含大量單詞的文本中刪除出現次數最少的單詞。如果有多個單詞都出現最少的次數，則將這些單詞都刪除。輸入數據：程序讀入已被命名為corpus.txt的一個大數據量的文本文件，該文件包含英文單詞和中文單詞，詞與詞之間以一個或多個whitespace（制表符、空格符和換行符一般被統稱為“白字符”(whitespace characters)）分隔。（為便于調試，您可下載測試corpus.txt文件，實際運行時我們會使用不同內容的輸入文件。）輸出數據：在標準輸出上打印刪除了corpus.txt中出現次數最少的單詞之后的文本（詞與詞保持原來的順序，仍以空格分隔）。
評分標準：
程序輸出結果必須正確，內存使用越少越好，程序的執行時間越快越好
#include<iostream>
#include<fstream>
#include<map>
#include<vector>
#include<string>
#include<cstring>
#include<cstdlib>
#include<iterator>
#include<algorithm>
#include<cctype>
using namespace std;

typedef map<string,int>::iterator mit;
typedef string::size_type sit;

int main()
{
map<string,int> words_count;
vector<string> sve;
string word;
string s=",!?.:""\n;'";
ifstream fin("E:\\corpus.txt");
if(!fin)
{
   cerr<<"unable to open file"<<endl;
   exit(0);
}
//讀取并統計單詞單詞
while(fin>>word)
   {
    sit iter=word.find_first_of(s);
    if(iter!=string::npos)
      word=word.substr(0,iter-0); //處理標點符號
    string temp(strlwr(const_cast<char*>(word.c_str())));
    word=temp;
    sve.push_back(word);
    ++words_count[word];
   }
fin.close();

//刪除個數最少的單詞
mit i=words_count.begin();
int n=i->second;
for(;i!=words_count.end();++i)
   if(i->second<n) n=i->second;
for(mit i=words_count.begin();i!=words_count.end(); )
     {
      if(i->second==n)
      {
       sve.erase(remove(sve.begin(),sve.end(),i->first),sve.end());
       ++i;
      }
      else ++i;
     }
//輸出到屏幕
copy(sve.begin(),sve.end(),ostream_iterator<string>(cout," "));
cout<<endl;

system("pause");
return 0;
}

posted on 2009-05-20 08:46 ivy-jie 閱讀(472) 評論(0) 編輯收藏引用所屬分類: arithmetic

只有注冊用戶登錄后才能發表評論。


相關文章: 轉:把十六進制字符串轉成數字的函數-類似atoi(char *) 200511 重疊區間大小關于漢字gbk編碼 200813 傳輸規劃 200812 圓內五角星低頻詞的過濾字符串替換

網站導航: 博客園 IT新聞 BlogJava 博問 Chat2DB 管理

青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品