清風竹林

ぷ雪飄絳梅映殘紅
ぷ花舞霜飛映蒼松
----- Do more,suffer less

統計

隨筆 - 68
文章 - 0
評論 - 110
引用 - 0

常用鏈接

留言簿(5)

隨筆分類

隨筆檔案

相冊

picture

TLink

搜索

閱讀排行榜

評論排行榜

Python Challenge lv2: ocr

題目鏈接： http://www.pythonchallenge.com/pc/def/ocr.html
根據提示，題目要求是從html頁面源文件的一段文本中找出rare characters。何為rare，暫時不知道，不過不要緊，先把整段文本存放于一個叫fin.txt的文件中，預處理一下：

if __name__ == '__main__':

finpath = 'fin.txt'

with open(finpath) as fin:

# translate text into a single string

text = ''.join([line.rstrip() for line in fin.read()])

d= {}

for c in text:

d[c] = d.get(c, 0) +1

for k, v in d.items():

print(k, v)

輸出結果：

! 6079
# 6115
% 6104
$ 6046
& 6043
) 6186
( 6154
+ 6066
* 6034
@ 6157
[ 6108
] 6152
_ 6112
^ 6030
a 1
e 1
i 1
l 1
q 1
u 1
t 1
y 1
{ 6046
} 6105

好了，很顯然了， rare characters指的就是個數為1的這幾個字母，于是將代碼稍微改一下即可打印得到結果：

if __name__ == '__main__':

finpath = 'fin.txt'

with open(finpath) as fin:

# translate text into a single string

text = ''.join([line.rstrip() for line in fin.read()])

d= {}

for c in text:

d[c] = d.get(c, 0) +1

print(''.join([c for c in text if d[c] ==1]))

程序輸出： equality

考慮到結果集中未輸出的都是非字母，因此可以考慮如下方法求解：

if __name__ == '__main__':

finpath = 'fin.txt'

with open(finpath) as fin:

# translate text into a single string

text = ''.join([line.rstrip() for line in fin.read()])

# only print letters

print(''.join([c for c in text if c.isalpha()]))

# another method

print(''.join(filter(lambda x: x.isalpha(), text)))

參考答案

posted on 2009-05-11 15:40 李現民閱讀(1265) 評論(0) 編輯收藏引用所屬分類: python

只有注冊用戶登錄后才能發表評論。
【推薦】100%開源！大型工業跨平臺軟件C++源碼提供，建模，組態！

相關文章: Python Challenge lv5: peak hell Python Challenge lv4: follow the chain Python Challenge lv3: re Python Challenge lv2: ocr Python Challenge lv1: What about making trans?

網站導航: 博客園 IT新聞 BlogJava 博問 Chat2DB 管理

青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品

清風竹林

導航

統計

常用鏈接

留言簿(5)

隨筆分類

隨筆檔案

相冊

TLink

搜索

最新評論

閱讀排行榜

評論排行榜

Python Challenge lv2: ocr