免费高清在线视频一区·,狠狠色狠狠色综合日日小说,久久激情五月激情

2006百度之星程序設計大賽試題-百度語言翻譯機(解答)

題目:
題目+我的解答打包下載
http://www.shnenglu.com/Files/zuroc/06_baidustar_translator.zip
百度語言翻譯機

百度的工程師們是非常注重效率的，在長期的開發與測試過程中，他們逐漸創造了一套獨特的縮略語。他們在平時的交談、會議，甚至在各種技術文檔中都會大量運用。

為了讓新員工可以更快地適應百度的文化，更好地閱讀公司的技術文檔，人力資源部決定開發一套專用的翻譯系統，把相關文檔中的縮略語和專有名詞翻譯成日常語言。

輸入要求：
輸入數據包含三部分：
1. 第一行包含一個整數N(N<=10000)，表示總共有多少個縮略語的詞條；

2. 緊接著有N行的輸入，每行包含兩個字符串，以空格隔開。第一個字符串為縮略語（僅包含大寫英文字符，長度不超過10字節），第二個字符串為日常語言（不包含空格，長度不超過255字節）；

3. 從第N+2開始到輸入結束為包含縮略語的相關文檔（總長度不超過1000000個字節）。例：

6
PS 門戶搜索部
NLP 自然語言處理
PM 產品市場部
HR 人力資源部
PMD 產品推廣部
MD 市場發展部
百度的部門包括PS，PM，HR，PMD，MD等等，其中PS還包括NLP小組。
樣例：in.txt

輸出要求：
輸出將縮略語轉換成日常語言后的文檔。（將縮略語轉換成日常語言，其他字符保留原樣）。例：

百度的部門包括門戶搜索部，產品市場部，人力資源部，產品推廣部，市場發展部等等，其中門戶搜索部還包括自然語言處理小組。
樣例：out.txt

評分規則：

1．程序將運行在一臺Linux機器上（內存使用不作嚴格限制），在每一測試用例上運行不能超過10秒，否則該用例不得分；

2．要求程序能按照輸入樣例的格式讀取數據文件，按照輸出樣例的格式將運行結果輸出到標準輸出上。如果不能正確讀入數據和輸出數據，該題將不得分；

3．該題目共有4個測試用例，每個測試用例為一個輸入文件。各測試用例占該題目分數的比例分別為25%，25%，25%，25%；

4．該題目20分。

注意事項：
1．輸入數據是中英文混合的，中文采用GBK編碼。
GBK：是又一個漢字編碼標準，全稱《漢字內碼擴展規范》。采用雙字節表示，總體編碼范圍為 8140-FEFE，首字節在 81-FE 之間，尾字節在40-FE 之間，排除xx7F。總計 23940 個碼位，共收入 21886 個漢字和圖形符號，其中漢字（包括部首和構件）21003 個，圖形符號 883 個。

2．為保證答案的唯一性，縮略語的轉換采用正向最大匹配（從左到右為正方向）原則。請注意樣例中PMD的翻譯。

代碼:
/*
我的思路

1.縮略語
vector< string >   //用來保存縮略語
按string的length排序,來滿足"縮略語的轉換采用正向最大匹配".

2.一次性的進行文本替換,以防止替換內容再次被替換
map<pair<int,int>,string>       //位置范圍-縮略語
vector<pair<int,int>>   //保存位置范圍
map<string,string>   //縮略語
*/

#include <fstream>
#include <sstream>
#include <iostream>

#include <vector>
#include <map>
#include <list>

#include <string>

#include <algorithm>
#include <functional>

using namespace std;

#define BEG_END(c)       (c.begin()),(c.end())

typedef string::size_type str_size;

/** 轉換string為指定的類型 */
template<typename Target, typename Source>
Target lexical_cast(const Source& arg)
{
    Target result;
    istringstream(arg)>>result;
    return result;
}

vector<str_size> find_all(const string& source , const string& aim)
{

    vector<str_size>   poses;

    str_size pos=0;
    str_size aim_len=aim.size();

    while ( (pos=source.find(aim, pos)) != string::npos)
    {
            poses.push_back(pos);
            pos += aim_len;
    }

    return poses;
}

bool is_long(const string& a , const string& b)
{
    return a.length()>b.length();
}

bool is_first_small(const pair<str_size,str_size>& a , const pair<str_size,str_size>& b)
{
    return a.first<b.first;
}

template<class T,class I>
bool not_in_scope(I begin,const I& end,const T& aim)
{
    for (;begin!=end;++begin)
    {
        if (
            (aim>=(begin->first) ) && (aim<= (begin->first+begin->second) )
        )return false;
    }
    return true;
}

int main()
{

    string infile_name="in.txt" , outfile_name="out.txt";

    ofstream outfile(outfile_name.c_str());
    //ostream& outfile = cout;

    ifstream infile(infile_name.c_str());
    if (!infile)
    {
        cerr<<"Error : can't open input file "<<infile_name<<" .\n";
        return -1;
    }

    string line;
    vector<string> abbr_dict;
    map<string,string>   abbr_word;

    getline(infile,line);
    for (int i=lexical_cast<int>(line);i!=0;--i)
    {
        getline(infile,line);
        string abbr,word;
        istringstream(line)>>abbr>>word;
        abbr_dict.push_back(abbr);
        abbr_word[abbr]=word;
        //cout<<abbr<<' '<<word<<'\n';
    }

    sort(BEG_END(abbr_dict),is_long);

    while (getline(infile,line))
    {
        typedef vector<pair<str_size,str_size> > replace_scope;

        replace_scope   to_replace_scope;
        map<pair<str_size,str_size>,string>   to_replace;

        for (
            vector<string>::iterator i=abbr_dict.begin(),end=abbr_dict.end();
            i!=end;
            ++i
        )
        {
            vector<str_size>   poses=find_all(line,*i);
            str_size aim_len=i->size();
            for (vector<str_size>::iterator j=poses.begin(),end=poses.end();j
                    !=end;++j)
            {
                pair<str_size,str_size> scope=make_pair(*j,aim_len);
                if (not_in_scope(BEG_END(to_replace_scope),*j))
                {
                    to_replace_scope.push_back(scope);
                    to_replace[scope]=*i;
                }
            }
        }

        sort(BEG_END(to_replace_scope),is_first_small);

        str_size offset=0;

        for (
            replace_scope::iterator i=to_replace_scope.begin(),end=to_replace_scope.end();
            i!=end;
            ++i
        )
        {
            str_size len=i->second ;
            string word=abbr_word[to_replace[*i]];
            line.replace(i->first+offset,len ,word);
            offset+=word.size()-len;
        }

        outfile<<line<<'\n';
    }

    return 0;
}

posted on 2007-05-13 21:25 張沈鵬閱讀(1037) 評論(0) 編輯收藏引用所屬分類: C++

只有注冊用戶登錄后才能發表評論。
【推薦】100%開源！大型工業跨平臺軟件C++源碼提供，建模，組態！

相關文章: 2006百度之星程序設計大賽試題-變態比賽規則(解答) 2006百度之星程序設計大賽試題-百度語言翻譯機(解答) [翻譯]Berkeley DB 文檔 - C++入門篇 - 1.3節 - 訪問方式(Access Methods) [意譯]Berkeley DB 文檔 - C++入門篇 - 1.2節 - Berkeley DB 概述 C++ std名字空間ostream_iterator與的詭異問題 boost::asio網絡庫教程翻譯更新中。 Boost.Asio 0.37教程 Timer.1(翻譯自Boost.Asio 0.37的文檔) Boost.Asio 0.37簡介(翻譯自Boost.Asio 0.37的文檔的首頁) 由boost網絡庫說起... boost::sockets 候選庫

網站導航: 博客園 IT新聞 BlogJava 博問 Chat2DB 管理

導航

常用鏈接

留言簿(3)

隨筆分類(44)

隨筆檔案(65)

相冊

友情Link

最新隨筆

搜索

積分與排名

最新評論

閱讀排行榜