??xml version="1.0" encoding="utf-8" standalone="yes"?>亚洲国产成人精品无码久久久久久综合 ,久久性精品,国内精品久久人妻互换http://www.shnenglu.com/sunrise/category/18950.html每天不断学习(fn)Q才能不断提升自己? Ƣ迎交流 QQQ?03979707 我的铺子Qhttp://www.u148.net/u/lwxzh-cnThu, 20 Sep 2012 11:03:41 GMTThu, 20 Sep 2012 11:03:41 GMT60NLP数据攉http://www.shnenglu.com/sunrise/archive/2012/09/20/191408.htmlSunRise_atSunRise_atThu, 20 Sep 2012 09:29:00 GMThttp://www.shnenglu.com/sunrise/archive/2012/09/20/191408.htmlhttp://www.shnenglu.com/sunrise/comments/191408.htmlhttp://www.shnenglu.com/sunrise/archive/2012/09/20/191408.html#Feedback0http://www.shnenglu.com/sunrise/comments/commentRss/191408.htmlhttp://www.shnenglu.com/sunrise/services/trackbacks/191408.html目前|上可供下蝲?a title="数据" target="_blank">数据众多Q但是内容庞杂,把其中比较有用的数据找了(jin)出来?br />
wikip:(x)
wikipedia大家都不陌生Q它的下载地址是:(x)http://dumps.wikimedia.org/ , q里有详l介l:(x)http://en.wikipedia.org/wiki/Wikipedia:Database_download
但是wikipedia只是Wikimedia基金?x)的一个子Qwikimedia下面q有多个其他的重要项目,包括Q?br />wiktionary    一?a title="语义? target="_blank">语义?/a>的关联词典,形式上类gwordnet
wikiquote    收录各种名h名言
Wikibooks    免费的教U书和手?br />Wikinews    大量的新L?br />Wikiversity    免费的教育材?br />Wikisource    免费的文本内?br />上述的这些内容,都可以通过http://dumps.wikimedia.org/ 下蝲到?br />q有一些小型的wiki目Q比如:(x)
http://simple.wikipedia.org    使用Basic English写的wikiQ给儿童和初学者看
http://simple.wiktionary.org    使用Basic English写的wiktionary

wikipedia的数据处理有很多方式Q我比较推崇q两个:(x)
jwpl:    http://code.google.com/p/jwpl/
wikipedia-miner:   http://wikipedia-miner.cms.waikato.ac.nz/wiki/

下面我介l下另一个商业化的wiki|站:http://www.wikia.com  q个|站?a title="用户" target="_blank">用户可以创徏单独的维基网站,下面是排名前250位wikia|站Q?br />http://wikis.wikia.com/wiki/List_of_Wikia_wikis
wikia上的资源也可供下载:(x)http://community.wikia.com/wiki/Help:Database_download

Freebase:
freebase是啥׃解释?jin),下面l出数据的下载地址Q?br />http://wiki.freebase.com/wiki/Data_dumps    freebase自n的数?br />http://wiki.freebase.com/wiki/WEX    freebase从wikipedia中提取的数据

YAGO2:
http://www.mpi-inf.mpg.de/yago-naga/yago/

dbpedia:
http://www.dbpedia.org

如果要找LinkedDataQ可以来q里Q?a rel="nofollow" target="_blank">http://www.thedatahub.org    q里攉?jin)很多Linked Data
http://linkeddata.org/    q里有一张图Q给Z(jin)各种linkeddata的关pd影响力?br />
如果要找各种|上的apiQ可以来q里Q?a rel="nofollow" target="_blank">http://www.programmableweb.com    
现在外国政府UL(fng)对外公开数据Q下面是几个政府的开放数据集Q?br />http://data.gov.au    澛_利亚
http://data.dc.gov    国哥u比亚州的
http://www.data.gov    国
http://data.gov.uk    英国
http://databases.lapl.org/    z杉矶地区的开放数据集Q知道硅谷ؓ(f)啥这么牛?jin)?br />http://www.gov.hk/en/theme/psi/welcome        香港政府也公开?jin)很多数?br />Ҏ(gu)一下,外国政府做了(jin)q么多实事,人民大会(x)堂里的那些酒囊饭袋们都在q什么?

http://lexsrv3.nlm.nih.gov/LexSysGroup/Projects/lexAccess/current/web/download.html    国国家卫生|发布的词表
http://www.census.gov/genealogy/www/data/2000surnames/index.html    国l计局的姓名数?br />https://www.cia.gov/library/publications/download/    国中央情报局发布的factbookQ介l了(jin)世界各国情况
q卫生vQ统计局和中情局q种单位都ؓ(f)国的信息徏讑ցZ(jin)q么多的贡献Q我们应该知道自p帝的差距有多大?jin)吧?br />
叙词表:(x)
http://www.nlm.nih.gov/mesh/filelist.html    mesh,关于d的受控词?br />http://id.loc.gov/download/            国国会(x)图书馆发布的叙词?br />
一些三元组数据Q?br />http://www.cs.utexas.edu/users/pclark/dart/    采集自BNCQ英国国家语料库Q和ReutersQ?300万条
http://reverb.cs.washington.edu/        华盛大学的目Q?500万条
http://www.cs.washington.edu/research/sherlock-hornclauses/    大约?00-300万条数据
http://www.cs.rochester.edu/research/knext    ?35万条数据Q来自BNC和布朗语料库
http://rtw.ml.cmu.edu/rtw/resources        readtheweb目Q数据量较小

词典Q?br />http://wordnet.princeton.edu/            p的wordnet
http://nlpwww.nict.go.jp/wn-ja/index.en.html    日语的wordnet
http://alpage.inria.fr/~sagot/wolf-en.html    法语的wordnet
http://wordnet.ru/                俄罗斯的wordnet
http://cl.haifa.ac.il/projects/mwn/index.shtml    希伯来语的wordnet
http://wordnet.dk/dannet/menu?item=2        业w语的wordnet
http://grial.uab.es/sensem/download?idioma=en    西班牙语的wordnet
http://www.ling.helsinki.fi/en/lt/research/finnwordnet/download.shtml    芬兰语的wordnet
q些不同版本的wordnet都是免费下蝲的。可恨中国泱׃千年的文明古国,文献典故如烟vQ竟q一份免费且公开的机读词兔R没有。这是汉语的耻iQ中国的耻iQ也是中华民族的耻i。特别是中科院计所和自动化所的h们,你们觉得呢?Q顺hownet生意兴隆Q越卖越好)(j)

http://dico.fj.free.fr/dico.php        日法词典
http://www.csse.monash.edu.au/~jwb/edict.html    日英词典
http://cc-cedict.org/wiki/start     中文到英文的词典Q终于出来中文的?jin),可惜是外国h搞出来的?br />https://framenet.icsi.berkeley.edu    Z框架语义学的东东Q恐怕不能算词典Q不q没地儿放了(jin)?br />
语料库:(x)
http://opus.lingfil.uu.se/    开攄q语料?br />http://opus.lingfil.uu.se/OpenSubtitles_v2.php    大量?sh)?jing)字幕的下载地址
http://www.statmt.org/europarl    Ƨ洲议会(x)的^行语料库
http://www.anc.org/OANC/    开攄国国家语料?br />
http://snap.stanford.edu/data/    斯坦大学的SNAP目Q抓?jin)很多数据,不过旉较早Q只有研Ih(hun)?/p>

SunRise_at 2012-09-20 17:29 发表评论
]]>
搭徏wiki镜像结http://www.shnenglu.com/sunrise/archive/2012/08/21/187819.htmlSunRise_atSunRise_atTue, 21 Aug 2012 01:22:00 GMThttp://www.shnenglu.com/sunrise/archive/2012/08/21/187819.htmlhttp://www.shnenglu.com/sunrise/comments/187819.htmlhttp://www.shnenglu.com/sunrise/archive/2012/08/21/187819.html#Feedback0http://www.shnenglu.com/sunrise/comments/commentRss/187819.htmlhttp://www.shnenglu.com/sunrise/services/trackbacks/187819.html     一.ApacheQPhp5QMysql不可,然后下蝲mediawiki软g?

之前没有接触q这些YӞso每一个都需要装....

(1)apache配置

 在Debian下, 安装完成后, 软g包ؓ(f)我们提供的配|文件位?etc/apache2目录下:(x)

  tony@tonybox:/etc/apache2$ ls -l

  total 72

  -rw-r--r-- 1 root root 12482 2006-01-16 18:15 apache2.conf

  drwxr-xr-x 2 root root 4096 2006-06-30 13:56 conf.d

  -rw-r--r-- 1 root root 748 2006-01-16 18:05 envvars

  -rw-r--r-- 1 root root 268 2006-06-30 13:56 httpd.conf

  -rw-r--r-- 1 root root 12441 2006-01-16 18:15 magic

  drwxr-xr-x 2 root root 4096 2006-06-30 13:56 mods-available

  drwxr-xr-x 2 root root 4096 2006-06-30 13:56 mods-enabled

  -rw-r--r-- 1 root root 10 2006-06-30 13:56 ports.conf

  -rw-r--r-- 1 root root 2266 2006-01-16 18:15 README

  drwxr-xr-x 2 root root 4096 2006-06-30 13:56 sites-available

  drwxr-xr-x 2 root root 4096 2006-06-30 13:56 sites-enabled

  drwxr-xr-x 2 root root 4096 2006-01-16 18:15 

  其中

  apache2.conf

  为apache2服务器的主配|文Ӟ 查看此配|文Ӟ 你会(x)发现以下内容

  # Include module configuration:

  Include /etc/apache2/mods-enabled/*.load

  Include /etc/apache2/mods-enabled/*.conf

  # Include all the user configurations:

  Include /etc/apache2/httpd.conf

  # Include ports listing

  Include /etc/apache2/ports.conf

  # Include generic snippets of statements

  Include /etc/apache2/conf.d/[^.#]*

  有此可见Q?apache2 Ҏ(gu)配置功能的不同, 寚w|文件进行了(jin)分割Q?q样更利于管?/p>

  conf.d

  下ؓ(f)配置文g的附加片断,默认情况下, 仅提供了(jin) charset 片断Q?/p>

  tony@tonybox:/etc/apache2/conf.d$ cat charset

  AddDefaultCharset UTF-8

  如有需要我们可以将默认~码修改?GB2312, x件的内容为:(x) AddDefaultCharset GB2312

  httpd.conf

  是个I文?/p>

  magic

  文g中包含的是有关mod_mime_magic模块的数据, 一般不需要修改它?/p>

  ports.conf

  则ؓ(f)服务器监听IP和端口设|的配置文gQ?/p>

  tony@tonybox:/etc/apache2$ cat ports.conf

  Listen 80

  mods-available

  目录下是一些。conf和。load 文gQ?为系l中可以使用的加载各U模块的配置文gQ?而mods-enabled目录下则是指向这些配|文件的W号q接Q?从配|文件apache2.conf 中可以看出, pȝ通过mods-enabled目录来加载模块, 也就是说Q?pȝ仅通过在此目录下创Z(jin)W号q接的mods-available 目录下的配置文g来加载模块。同时系l还提供?jin)两个命?a2enmod ?a2dismod用于l护q些W号q接。这两个命o(h)?apache2-common 包提供。命令各式也非常单:(x) a2enmod [module] ?a2dismod [module]

  sites-available

  目录下ؓ(f)配置好的站点的配|文Ӟ sites-enabled 目录下则是指向这些配|文件的W号q接Q?pȝ通过q些W号q接来v用站?sites-enabled目录下的W号q接附有一个数字前~Q??00-default, q个数字用于军_启动序Q?数字小Q?启动优先U越高?pȝ提供?jin)两个命?a2ensite ?a2dissite 用于l护q些W号q接。这两个命o(h)?apache2-common 包提供?/p>

  /var/www

  默认情况下将要发布的|页文g应该|于/var/www目录下,q一默认值可以同q主配置文g中的DocumnetRoot 选项修改?/p>

  ?mediawiki直接解压到apache里面(是解压在var/www路径?,解压后重名ؓ(f)wikiQ?/p>

? 然后q主localhost/wikiQ对MediaWikiq行安装。去创徏数据库wikidb。里面有41个表。在导入数据之间Q要先清除page,revision,text三个表?/p>

delete from page; 

delete from revision; 

delete from text; 

?http://dumps.wikimedia.org/backup-index.html在这里可以下载Q何语awiki的数据库xml文g。下载的文gcM于enwiki-20061130-pages-articles.xml.bz2Q英文版的)(j)Qwiki差不多每两个月更Cơ数据?/p>

?安装mediawiki。去下蝲mediawiki的源代码Q如果其官方|站被封的话可以去www.allwiki.comq个中文|站上去下蝲。下载后解压C的apache能找到的一个目录下Q将其config目录权限讄?77Q然后在览器里讉K?config/index.phpQ进行一些配|后Q会(x)在config目录下生成一个LocalSettings.php的文Ӟ这个文件拷贝到它的上一U目录。最后别忘了(jin)config的目录再改回原来的权限?/p>

?把文件导入数据库Q?nbsp;
命o(h)Q?nbsp;
java -Xmx600M -server -jar mwdumper.jar --format=sql:1.5 
enwiki-20061130-pages-articles.xml.bz2 | mysql -u wikiuser -p wikidb 

参见Q?a >http://fuhao-987.iteye.com/blog/1044933

http://jgs80.blog.163.com/blog/static/3566265320076177435762/



SunRise_at 2012-08-21 09:22 发表评论
]]>
Penn Treebank Tagshttp://www.shnenglu.com/sunrise/archive/2012/07/31/185743.htmlSunRise_atSunRise_atTue, 31 Jul 2012 05:31:00 GMThttp://www.shnenglu.com/sunrise/archive/2012/07/31/185743.htmlhttp://www.shnenglu.com/sunrise/comments/185743.htmlhttp://www.shnenglu.com/sunrise/archive/2012/07/31/185743.html#Feedback0http://www.shnenglu.com/sunrise/comments/commentRss/185743.htmlhttp://www.shnenglu.com/sunrise/services/trackbacks/185743.html

 

Note: This information comes from "Bracketing Guidelines for Treebank II Style Penn Treebank Project" - part of the documentation that comes with the Penn Treebank.

 

Contents:

Bracket Labels

Clause Level

Phrase Level

Word Level

Function Tags

Form/function discrepancies

Grammatical role

Adverbials

Miscellaneous

Index of All Tags

Bracket Labels

Clause Level

S - simple declarative clause, i.e. one that is not introduced by a (possible empty) subordinating conjunction or a wh-word and that does not exhibit subject-verb inversion.

SBAR - Clause introduced by a (possibly empty) subordinating conjunction.

SBARQ - Direct question introduced by a wh-word or a wh-phrase. Indirect questions and relative clauses should be bracketed as SBAR, not SBARQ.

SINV - Inverted declarative sentence, i.e. one in which the subject follows the tensed verb or modal.

SQ - Inverted yes/no question, or main clause of a wh-question, following the wh-phrase in SBARQ.

 

Phrase Level

ADJP - Adjective Phrase.

ADVP - Adverb Phrase.

CONJP - Conjunction Phrase.

FRAG - Fragment.

INTJ - Interjection. Corresponds approximately to the part-of-speech tag UH.

LST - List marker. Includes surrounding punctuation.

NAC - Not a Constituent; used to show the scope of certain prenominal modifiers within an NP.

NP - Noun Phrase.

NX - Used within certain complex NPs to mark the head of the NP. Corresponds very roughly to N-bar level but used quite differently.

PP - Prepositional Phrase.

PRN - Parenthetical.

PRT - Particle. Category for words that should be tagged RP.

QP - Quantifier Phrase (i.e. complex measure/amount phrase); used within NP.

RRC - Reduced Relative Clause.

UCP - Unlike Coordinated Phrase.

VP - Vereb Phrase.

WHADJP - Wh-adjective Phrase. Adjectival phrase containing a wh-adverb, as in how hot.

WHAVP - Wh-adverb Phrase. Introduces a clause with an NP gap. May be null (containing the 0 complementizer) or lexical, containing a wh-adverb such as how or why.

WHNP - Wh-noun Phrase. Introduces a clause with an NP gap. May be null (containing the 0 complementizer) or lexical, containing some wh-word, e.g. who, which book, whose daughter, none of which, or how many leopards.

WHPP - Wh-prepositional Phrase. Prepositional phrase containing a wh-noun phrase (such as of which or by whose authority) that either introduces a PP gap or is contained by a WHNP.

X - Unknown, uncertain, or unbracketable. X is often used for bracketing typos and in bracketing the...the-constructions.

Word level

CC - Coordinating conjunction

CD - Cardinal number

DT - Determiner

EX - Existential there

FW - Foreign word

IN - Preposition or subordinating conjunction

JJ - Adjective

JJR - Adjective, comparative

JJS - Adjective, superlative

LS - List item marker

MD - Modal

NN - Noun, singular or mass

NNS - Noun, plural

NNP - Proper noun, singular

NNPS - Proper noun, plural

PDT - Predeterminer

POS - Possessive ending

PRP - Personal pronoun

PRP$ - Possessive pronoun (prolog version PRP-S)

RB - Adverb

RBR - Adverb, comparative

RBS - Adverb, superlative

RP - Particle

SYM - Symbol

TO - to

UH - Interjection

VB - Verb, base form

VBD - Verb, past tense

VBG - Verb, gerund or present participle

VBN - Verb, past participle

VBP - Verb, non-3rd person singular present

VBZ - Verb, 3rd person singular present

WDT - Wh-determiner

WP - Wh-pronoun

WP$ - Possessive wh-pronoun (prolog version WP-S)

WRB - Wh-adverb

Function tags

Form/function discrepancies

-ADV (adverbial) - marks a constituent other than ADVP or PP when it is used adverbially (e.g. NPs or free ("headless" relatives). However, constituents that themselves are modifying an ADVP generally do not get -ADV. If a more specific tag is available (for example, -TMP) then it is used alone and -ADV is implied. See the Adverbials section.

-NOM (nominal) - marks free ("headless") relatives and gerunds when they act nominally.

Grammatical role

-DTV (dative) - marks the dative object in the unshifted form of the double object construction. If the preposition introducing the "dative" object is for, it is considered benefactive (-BNF). -DTV (and -BNF) is only used after verbs that can undergo dative shift.

-LGS (logical subject) - is used to mark the logical subject in passives. It attaches to the NP object of by and not to the PP node itself.

-PRD (predicate) - marks any predicate that is not VP. In the do so construction, the so is annotated as a predicate.

-PUT - marks the locative complement of put.

-SBJ (surface subject) - marks the structural surface subject of both matrix and embedded clauses, including those with null subjects.

-TPC ("topicalized") - marks elements that appear before the subject in a declarative sentence, but in two cases only:

if the front element is associated with a *T* in the position of the gap.

if the fronted element is left-dislocated (i.e. it is associated with a resumptive pronoun in the position of the gap).

-VOC (vocative) - marks nouns of address, regardless of their position in the sentence. It is not coindexed to the subject and not get -TPC when it is sentence-initial.

Adverbials

Adverbials are generally VP adjuncts.

-BNF (benefactive) - marks the beneficiary of an action (attaches to NP or PP).

This tag is used only when (1) the verb can undergo dative shift and (2) the prepositional variant (with the same meaning) uses for. The prepositional objects of dative-shifting verbs with other prepositions than for (such as to or of) are annotated -DTV.

 

-DIR (direction) - marks adverbials that answer the questions "from where?" and "to where?" It implies motion, which can be metaphorical as in "...rose 5 pts. to 57-1/2" or "increased 70% to 5.8 billion yen" -DIR is most often used with verbs of motion/transit and financial verbs.

-EXT (extent) - marks adverbial phrases that describe the spatial extent of an activity. -EXT was incorporated primarily for cases of movement in financial space, but is also used in analogous situations elsewhere. Obligatory complements do not receive -EXT. Words such as fully and completely are absolutes and do not receive -EXT.

-LOC (locative) - marks adverbials that indicate place/setting of the event. -LOC may also indicate metaphorical location. There is likely to be some varation in the use of -LOC due to differing annotator interpretations. In cases where the annotator is faced with a choice between -LOC or -TMP, the default is -LOC. In cases involving SBAR, SBAR should not receive -LOC. -LOC has some uses that are not adverbial, such as with place names that are adjoined to other NPs and NAC-LOC premodifiers of NPs. The special tag -PUT is used for the locative argument of put.

-MNR (manner) - marks adverbials that indicate manner, including instrument phrases.

-PRP (purpose or reason) - marks purpose or reason clauses and PPs.

-TMP (temporal) - marks temporal or aspectual adverbials that answer the questions when, how often, or how long. It has some uses that are not strictly adverbial, auch as with dates that modify other NPs at S- or VP-level. In cases of apposition involving SBAR, the SBAR should not be labeled -TMP. Only in "financialspeak," and only when the dominating PP is a PP-DIR, may temporal modifiers be put at PP object level. Note that -TMP is not used in possessive phrases.

 

Miscellaneous

-CLR (closely related) - marks constituents that occupy some middle ground between arguments and adjunct of the verb phrase. These roughly correspond to "predication adjuncts", prepositional ditransitives, and some "phrasel verbs". Although constituents marked with -CLR are not strictly speaking complements, they are treated as complements whenever it makes a bracketing difference. The precise meaning of -CLR depends somewhat on the category of the phrase.

on S or SBAR - These categories are usually arguments, so the -CLR tag indicates that the clause is more adverbial than normal clausal arguments. The most common case is the infinitival semi-complement of use, but there are a variety of other cases.

on PP, ADVP, SBAR-PRP, etc - On categories that are ordinarily interpreted as (adjunct) adverbials, -CLR indicates a somewhat closer relationship to the verb. For example:

Prepositional Ditransitives

In order to ensure consistency, the Treebank recognizes only a limited class of verbs that take more than one complement (-DTV and -PUT and Small Clauses) Verbs that fall outside these classes (including most of the prepositional ditransitive verbs in class [D2]) are often associated with -CLR.

Phrasal verbs

Phrasal verbs are also annotated with -CLR or a combination of -PRT and PP-CLR. Words that are considered borderline between particle and adverb are often bracketed with ADVP-CLR.

Predication Adjuncts

Many of Quirk's predication adjuncts are annotated with -CLR.

on NP - To the extent that -CLR is used on NPs, it indicates that the NP is part of some kind of "fixed phrase" or expression, such as take care of. Variation is more likely for NPs than for other uses of -CLR.

-CLF (cleft) - marks it-clefts ("true clefts") and may be added to the labels S, SINV, or SQ.

-HLN (headline) - marks headlines and datelines. Note that headlines and datelines always constitute a unit of text that is structurally independent from the following sentence.

-TTL (title) - is attached to the top node of a title when this title appears inside running text. -TTL implies -NOM. The internal structure of the title is bracketed as usual.

Index of All Tags

ADJP

-ADV

ADVP

-BNF

CC

CD

-CLF

-CLR

CONJP

-DIR

DT

-DTV

EX

-EXT

FRAG

FW

-HLN

IN

INTJ

JJ

JJR

JJS

-LGS

-LOC

LS

LST

MD

-MNR

NAC

NN

NNS

NNP

NNPS

-NOM

NP

NX

PDT

POS

PP

-PRD

PRN

PRP

-PRP

PRP$ or PRP-S

PRT

-PUT

QP

RB

RBR

RBS

RP

RRC

S

SBAR

SBARQ

-SBJ

SINV

SQ

SYM

-TMP

TO

-TPC

-TTL

UCP

UH

VB

VBD

VBG

VBN

VBP

VBZ

-VOC

VP

WDT

WHADJP

WHADVP

WHNP

WHPP

WP

WP$ or WP-S

WRB

X

 

 



SunRise_at 2012-07-31 13:31 发表评论
]]>
召回率和准确?/title><link>http://www.shnenglu.com/sunrise/archive/2012/07/23/184693.html</link><dc:creator>SunRise_at</dc:creator><author>SunRise_at</author><pubDate>Mon, 23 Jul 2012 01:41:00 GMT</pubDate><guid>http://www.shnenglu.com/sunrise/archive/2012/07/23/184693.html</guid><wfw:comment>http://www.shnenglu.com/sunrise/comments/184693.html</wfw:comment><comments>http://www.shnenglu.com/sunrise/archive/2012/07/23/184693.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.shnenglu.com/sunrise/comments/commentRss/184693.html</wfw:commentRss><trackback:ping>http://www.shnenglu.com/sunrise/services/trackbacks/184693.html</trackback:ping><description><![CDATA[转自Q?a target="_blank">http://uwei.blogbus.com/logs/11424864.html</a><br />外行人做互联|,很多概念不懂。就拿最基础?#8220;召回?#8221;?#8220;准确?#8221;q种概念Q看看网上资料知道大概,自己用的时候,脑子里绕着弯儿能想明白Q可到别hzȝ的时候,脑子里还是没法一下子反应q来Q还是要l弯想一下。特地找?jin)些资料Q将q两个概忉|理一下,希望能更熟练? <p>召回率和准确率是搜烦(ch)引擎Q或其它(g)索系l)(j)的设计中很重要的两个概念和指标?br />召回率:(x)RecallQ又U?#8220;查全?#8221;Q?<br />准确率:(x)PrecisionQ又U?#8220;_ֺ”?#8220;正确?#8221;?br />在一个大规模数据集合中检索文档时Q可把集合中的所有文档分成四c:(x)</p> <table border="0" cellspacing="0" cellpadding="0" width="258" align="left" height="79"> <tbody> <tr> <td height="27" width="96"> </td> <td width="74"> <div align="center">相关</div></td> <td width="74"> <div align="center">不相?/div></td></tr> <tr> <td height="25"> <div align="center">(g)索到</div></td> <td bgcolor="#6633cc"> <div align="center"><strong>A</strong></div></td> <td bgcolor="#ffff00"> <div align="center"><strong>B</strong></div></td></tr> <tr> <td height="23"> <div align="center">未检索到</div></td> <td bgcolor="#66ff66"> <div align="center"><strong>C</strong></div></td> <td bgcolor="#66ccff"> <div align="center"><strong>D</strong></div></td></tr></tbody></table> <p> </p> <p> </p> <div align="left"> <p> </p> <p>AQ检索到的,相关?nbsp;                  Q搜到的也想要的Q?br />BQ检索到的,但是不相关的           Q搜到的但没用的Q?br />CQ未(g)索到的,但却是相关的        Q没搜到Q然而实际上惌的)(j)<br />DQ未(g)索到的,也不相关?nbsp;         Q没搜到也没用的Q?/p> <p>通常我们希望Q数据库中相关的文档Q被(g)索到的越多越好,q是q求“查全?#8221;Q即A/(A+C)Q越大越好?br />同时我们q希望:(x)(g)索到的文档中Q相关的多好Q不相关的越越好,q是q求“准确?#8221;Q即A/(A+B)Q越大越好?br />  <br />归纳如下Q?br />召回率:(x)(g)索到的相x??库中所有的相关文档<br />准确率:(x)(g)索到的相x??所有被(g)索到的文?br />  <br />“召回?#8221;?#8220;准确?#8221;虽然没有必然的关p(从上面公式中可以看到Q,然而在大规模数据集合中Q这两个指标却是怺制约的?br />׃“(g)索策?#8221;q不完美Q希望更多相关的文档被检索到Ӟ攑֮“(g)索策?#8221;Ӟ往往也会(x)伴随出现一些不相关的结果,从而准确率受到媄(jing)响?br />而希望去除检索结果中的不相关文档Ӟ务必要将“(g)索策?#8221;定的更加严格Q这样也?x)有一些相关的文档不再能被(g)索到Q从而召回率受到媄(jing)响?/p></div> <p>凡是设计到大规模数据集合的检索和选取Q都涉及(qing)?#8220;召回?#8221;?#8220;准确?#8221;q两个指标。而由于两个指标相互制U,我们通常也会(x)Ҏ(gu)需要ؓ(f)“(g)索策?#8221;选择一个合适的度,不能太严g不能太松Q寻求在召回率和准确率中间的一个^衡点。这个^衡点由具体需求决定?/p> <p>其实Q准率QprecisionQ精度)(j)比较好理解。往往难以q速反应的?#8220;召回?#8221;。我惌与字面意思也有关p,?#8220;召回”的字面意思不能直接看到其意义?br />我觉?#8220;召回?#8221;q个词翻译的不够好?#8220;召回”在中文的意思是Q把xx调回来。比如sony甉|有问题,厂家召回?br />既然说翻译的不好Q我们回头看“召回?#8221;对应的英?#8220;recall”Qrecall除了(jin)有上面说到的“order sth to return”的意思之外,q有“remember”的意思?/p> <p>RecallQthe ability to remember sth. that you have learned or sth. that has happened in the past.</p> <p>q里Qrecall应该是这个意思,q样更Ҏ(gu)理解“召回?#8221;的意思了(jin)?br />当我们问(g)索系l某一件事的所有细节时Q输入检索queryQ,Recall是指:(x)(g)索系l能“回忆”起那些事的多细节,通俗来讲是“回忆的能?#8221;。能回忆h的细节数 除以 pȝ知道qg事的所有细节,是“记忆?#8221;Q也是recall——召回率?br />  <br />q样惻I要容易的多了(jin)?/p><img src ="http://www.shnenglu.com/sunrise/aggbug/184693.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.shnenglu.com/sunrise/" target="_blank">SunRise_at</a> 2012-07-23 09:41 <a href="http://www.shnenglu.com/sunrise/archive/2012/07/23/184693.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>《数学之》-Q马?dng)可夫?/title><link>http://www.shnenglu.com/sunrise/archive/2012/07/12/182950.html</link><dc:creator>SunRise_at</dc:creator><author>SunRise_at</author><pubDate>Thu, 12 Jul 2012 00:54:00 GMT</pubDate><guid>http://www.shnenglu.com/sunrise/archive/2012/07/12/182950.html</guid><wfw:comment>http://www.shnenglu.com/sunrise/comments/182950.html</wfw:comment><comments>http://www.shnenglu.com/sunrise/archive/2012/07/12/182950.html#Feedback</comments><slash:comments>4</slash:comments><wfw:commentRss>http://www.shnenglu.com/sunrise/comments/commentRss/182950.html</wfw:commentRss><trackback:ping>http://www.shnenglu.com/sunrise/services/trackbacks/182950.html</trackback:ping><description><![CDATA[    最q在ȝ一本书《数学之》,׃自己寚w?dng)可夫链~Z相关的知识背景,故学?fn)?jin)一下。对于N久没有看q概率论的h来说Q重拾v来也p?jin)一Ҏ(gu)间。比如:(x)P(A|B)是指在B的条件下A的概率,诸如此类Q都需要重新复?fn)一下,正所谓温故而知新。知道这个了(jin)Q也׃隄解马?dng)可夫链的性质Q即<span style="font-family: sans-serif; ">每一步可以移动到M一个相?c)点,在这里移动到每一个点的概率都是相同的?br /></span><p>    关于马尔可夫铄定义Q?nbsp;<a >http://zh.wikipedia.org/wiki/%E9%A6%AC%E5%8F%AF%E5%A4%AB%E9%8F%88<br /> </a>    <br />    隐含马尔可夫模型是上q马?dng)可夫链的一个扩展:(x)M一个时刻t的状态St是不可见的。隐含马?dng)可夫模型在每一个时刻t?x)输Z个符P而且q个W合和st相关Q而且仅和st相关Q这个被UCؓ(f)独立输出假设。关于隐含马?dng)可夫模型的成功应用可以参见吴军的《数学之》第5章的内容?br />   额,快到上班旉?jin),ȝ一下。l码农中......</p><img src ="http://www.shnenglu.com/sunrise/aggbug/182950.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.shnenglu.com/sunrise/" target="_blank">SunRise_at</a> 2012-07-12 08:54 <a href="http://www.shnenglu.com/sunrise/archive/2012/07/12/182950.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>l计自然语言处理--互信?/title><link>http://www.shnenglu.com/sunrise/archive/2012/06/01/177044.html</link><dc:creator>SunRise_at</dc:creator><author>SunRise_at</author><pubDate>Fri, 01 Jun 2012 05:06:00 GMT</pubDate><guid>http://www.shnenglu.com/sunrise/archive/2012/06/01/177044.html</guid><wfw:comment>http://www.shnenglu.com/sunrise/comments/177044.html</wfw:comment><comments>http://www.shnenglu.com/sunrise/archive/2012/06/01/177044.html#Feedback</comments><slash:comments>2</slash:comments><wfw:commentRss>http://www.shnenglu.com/sunrise/comments/commentRss/177044.html</wfw:commentRss><trackback:ping>http://www.shnenglu.com/sunrise/services/trackbacks/177044.html</trackback:ping><description><![CDATA[<div style="layout-grid: 15.6pt none" class="Section0"> <p style="margin-top: 0pt; margin-bottom: 0pt" class="p0"><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'">   今天六一Q?font face="Times New Roman">C</font><font face="宋体">加不在w边Q؜球啊。Q务需要在看曼宁的《统计自然语a处理基础》。然后用C信息Q每ơ我觉得好高q名字Q做下去的时候就发现没有那么难?/font></span><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'"><o:p></o:p></span></p> <p style="margin-top: 0pt; margin-bottom: 0pt" class="p0"><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'">搭配</span><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'"><o:p></o:p></span></p> <p style="margin-top: 0pt; margin-bottom: 0pt" class="p0"><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'">搭配由有限的复合构词法所描述?/span><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'"><o:p></o:p></span></p> <p style="margin-top: 0pt; margin-bottom: 0pt" class="p0"><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'">识别搭配对的Ҏ(gu)有三U:(x)<font face="Times New Roman">1.</font><font face="宋体">使用频率信息的搭配识别?/font><font face="Times New Roman">2.</font><font face="宋体">Z含义和主词搭配词之间的距识别?/font><font face="Times New Roman">3.</font><font face="宋体">Z假设试和互信息的识别?/font></span><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'"><o:p></o:p></span></p> <p style="margin-top: 0pt; margin-bottom: 0pt" class="p0"><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'">1.</span><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'">频率</span><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'"><o:p></o:p></span></p> <p style="margin-top: 0pt; margin-bottom: 0pt" class="p0"><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'">语料过滤后得到的动词,名词Q之间进行两两配对,l计每个词语在一个句子,或在一个段落中出现的次敎ͼ即ؓ(f)频率?/span><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'"><o:p></o:p></span></p> <p style="margin-top: 0pt; margin-bottom: 0pt" class="p0"><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'">2.</span><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'">均值和方差 </span><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'"><o:p></o:p></span></p> <p style="margin-top: 0pt; margin-bottom: 0pt" class="p0"><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'">   ׃两个词之间的距离是可以变化的Q计两个词之间的偏U量的均值和方差?/span><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'"><o:p></o:p></span></p> <p style="margin-top: 0pt; margin-bottom: 0pt" class="p0"><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'">均值就是简单的q_偏移量?/span><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'"><o:p></o:p></span></p> <p style="margin-top: 0pt; margin-bottom: 0pt" class="p0"><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'">方差衡量的是单独的偏U量偏离均值的距离Q?/span><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'"><o:p></o:p></span></p> <p style="margin-top: 0pt; margin-bottom: 0pt" class="p0"><span></span><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'"><o:p></o:p></span></p> <p style="margin-top: 0pt; margin-bottom: 0pt" class="p0"><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'"><img border="0" alt="" src="http://www.shnenglu.com/images/cppblog_com/sunrise/QQ截图20120601103629.png" width="232" height="140" /><br /> </span><span><img alt="" src="file:///C:/DOCUME~1/ADMINI~1/LOCALS~1/Temp/ksohtml/wps_clip_image-31877.png" width="18" height="25" /></span><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'">是同?font face="Times New Roman">i</font><font face="宋体">的偏U量Q?/font></span><span><img alt="" src="file:///C:/DOCUME~1/ADMINI~1/LOCALS~1/Temp/ksohtml/wps_clip_image-14102.png" width="15" height="30" /></span><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'">表示的是h偏移量的均倹{?nbsp;</span><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'"><o:p></o:p></span></p> <p style="margin-top: 0pt; margin-bottom: 0pt" class="p0"><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'">     我们可以通过使用q个信息来发现搭配。具体的Ҏ(gu)是通过L带有低偏差的词对。一个低的偏差值意味着q两个词通常大致相同距离出现。零偏差意味着q两个词L以相同的距离出现?/span><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'"><o:p></o:p></span></p> <p style="margin-top: 0pt; margin-bottom: 0pt" class="p0"><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'">   方差是关于一个相对于其他词分布峰值情늚度量?/span><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'"><o:p></o:p></span></p> <p style="margin-top: 0pt; margin-bottom: 0pt" class="p0"><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'"><o:p></o:p></span></p> <p style="margin-top: 0pt; margin-bottom: 0pt" class="p0"><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'">关于互信?/span><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'"><o:p></o:p></span></p> <p style="margin-top: 0pt; margin-bottom: 0pt" class="p0"><span style="font-family: 'Times New Roman'; font-size: 10.5pt; mso-spacerun: 'yes'">互信息的计算公式是这L(fng)Q?/span><span style="font-family: 'Times New Roman'; font-size: 10.5pt; mso-spacerun: 'yes'"><o:p></o:p></span></p> <p style="margin-top: 0pt; margin-bottom: 0pt" class="p0"><span style="font-family: 'Times New Roman'; font-size: 10.5pt; mso-spacerun: 'yes'">MI(a,b) = log( p(ab) / (p(a)*p(b)) )</span><span style="font-family: 'Times New Roman'; font-size: 10.5pt; mso-spacerun: 'yes'"><o:p></o:p></span></p> <p style="margin-top: 0pt; margin-bottom: 0pt" class="p0"><span style="font-family: 'Times New Roman'; font-size: 10.5pt; mso-spacerun: 'yes'">其中<font face="Times New Roman">log</font><font face="宋体">的底数是</font><font face="Times New Roman">2</font><font face="宋体">Q?/font><font face="Times New Roman">p(x)</font><font face="宋体">表示</font><font face="Times New Roman">x</font><font face="宋体">出现的概率?/font></span><span style="font-family: 'Times New Roman'; font-size: 10.5pt; mso-spacerun: 'yes'"><o:p></o:p></span></p> <p style="margin-top: 0pt; margin-bottom: 0pt" class="p0"><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'">好吧Q好_(d)好简单。。着手写代码?jin)?/span><span style="font-family: '宋体'; font-size: 10.5pt; mso-spacerun: 'yes'"><o:p></o:p></span></p></div><!--EndFragment--> <img src ="http://www.shnenglu.com/sunrise/aggbug/177044.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.shnenglu.com/sunrise/" target="_blank">SunRise_at</a> 2012-06-01 13:06 <a href="http://www.shnenglu.com/sunrise/archive/2012/06/01/177044.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>ZHowNet语义怼度的QAQq研究http://www.shnenglu.com/sunrise/archive/2012/04/12/171089.htmlSunRise_atSunRise_atThu, 12 Apr 2012 08:19:00 GMThttp://www.shnenglu.com/sunrise/archive/2012/04/12/171089.htmlhttp://www.shnenglu.com/sunrise/comments/171089.htmlhttp://www.shnenglu.com/sunrise/archive/2012/04/12/171089.html#Feedback1http://www.shnenglu.com/sunrise/comments/commentRss/171089.htmlhttp://www.shnenglu.com/sunrise/services/trackbacks/171089.html    l徏立同义词库后的新dQ读文献Q然后找出问题的解决Ҏ(gu)。几文献都是研I句子与句子的相似度计算Q我们的关键是词语与句子的相似度计算。据说FAQ是自然语a处理领域研究的热炏V看?jin)几论文,感觉都是大同异?br />   因ؓ(f)是第一ơ接触这些东西,所以有很多陌生的词汇,p己动手查?jin)查?br />   关于HowNet,?a >http://www.keenage.com/zhiwang/c_zhiwang.html
   FAQ自动问答pȝ的核?j)问题是如何快速地客h提问题与FAQ数据库的问题比较Q进而确定与其最怼的问题,如果有,则将对应的答案作为结果回复给客户?br />                                           

                                                                                    FAQpȝl构?br />      怼度流E的计算是先计义原相似度Q然后是概念怼度,接着词语怼度,最后就是句子相似度?br />     /Files/sunrise/怼?docq里公式不能昄q似度计算插在附件中?jin)?br />     FAQ差不多就q行到这里了(jin)。程序小白的白文章Q小白将l箋白下去?/p>

           
 



SunRise_at 2012-04-12 16:19 发表评论
]]>
自然语言处理相关书籍?qing)其他资?/title><link>http://www.shnenglu.com/sunrise/archive/2012/03/28/169243.html</link><dc:creator>SunRise_at</dc:creator><author>SunRise_at</author><pubDate>Wed, 28 Mar 2012 01:35:00 GMT</pubDate><guid>http://www.shnenglu.com/sunrise/archive/2012/03/28/169243.html</guid><wfw:comment>http://www.shnenglu.com/sunrise/comments/169243.html</wfw:comment><comments>http://www.shnenglu.com/sunrise/archive/2012/03/28/169243.html#Feedback</comments><slash:comments>1</slash:comments><wfw:commentRss>http://www.shnenglu.com/sunrise/comments/commentRss/169243.html</wfw:commentRss><trackback:ping>http://www.shnenglu.com/sunrise/services/trackbacks/169243.html</trackback:ping><description><![CDATA[<p style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; font-family: Arial; line-height: 26px; text-align: left; ">特别推荐Q?br />1?a target="_blank" style="color: #ca0000; text-decoration: none; ">HMM学习(fn)最佌?/a>全文文档<br />2?a target="_blank" style="color: #ca0000; text-decoration: none; ">无约束最优化</a>全文文档</p><p style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; font-family: Arial; line-height: 26px; text-align: left; ">一、书c:(x)<br />1?a target="_blank" style="color: #ca0000; text-decoration: none; ">《自然语a处理lD》英文版W二?/a><br />2?a target="_blank" style="color: #ca0000; text-decoration: none; ">《统计自然语a处理基础》英文版</a><br />3?a target="_blank" style="color: #ca0000; text-decoration: none; ">《用Pythonq行自然语言处理》,NLTK配套?/a><br />4?a target="_blank" style="color: #ca0000; text-decoration: none; ">《Learning PythonW三版?/a>QPython入门l典书籍Q详l而不厌其?br />5?a target="_blank" style="color: #ca0000; text-decoration: none; ">《自然语a处理中的模式识别?/a><br />6?a target="_blank" style="color: #ca0000; text-decoration: none; ">《EM法?qing)其扩展?/a><br />7?a target="_blank" style="color: #ca0000; text-decoration: none; ">《统计学?fn)基?/a><br />8、?a target="_blank" style="color: #ca0000; text-decoration: none; ">自然语言理解</a>》英文版Q似乎只有前9章)(j)<br />9?a target="_blank" style="color: #ca0000; text-decoration: none; ">《Fundamentals of Speech Recognition?/a>Q质量不太好Q不q第6章关于HMM的部分比较详l,作者之一便是Lawrence RabinerQ?br />10、概率统计经典入门书Q《概率论?qing)其应用》(英文版,威廉*费勒著)(j)<br />  <a target="_blank" style="color: #ca0000; text-decoration: none; ">W一?/a>  <a target="_blank" style="color: #ca0000; text-decoration: none; ">W二?/a>  <a target="_blank" style="color: #ca0000; text-decoration: none; ">DjVuLibre阅读?/a>Q阅d两卷书需要)(j)<br />11、一本利用Perl和Prologq行自然语言处理的介l书c:(x)?a target="_blank" style="color: #ca0000; text-decoration: none; ">An Introduction to Language Processing with Perl and Prolog</a>?br />12、国外机器学?fn)书c之Q?br /> 1) “<a target="_blank" style="color: #ca0000; text-decoration: none; ">Programming Collective Intelligence</a>“Q中文译名《集体智慧编E》,机器学习(fn)&数据挖掘领域”q年出的入门好书Q培d是最重要的一环,一上来看大部头很容易被吓走?#8221;<br /> 2) “<a target="_blank" style="color: #ca0000; text-decoration: none; ">Machine Learning</a>“,机器学习(fn)领域无可争议的经怹c,下蝲完毕后~改ؓ(f)pdf卛_。豆瓣评?by王宁Q:(x)老书Q牛人。现在看来内容ƈ不算深,很多章节有点Cؓ(f)止的感觉Q但是很适合新手Q当?dng)不?#8221;?#8221;到连法和概率都不知道)(j)入门。比如决{树(wi)部分很_ֽQƈ且这几年没有特别大的q展Q所以ƈ不过时。另外,q本书算是对97q前数十q机器学?fn)工作的大综qͼ参考文献列表极有h(hun)倹{国内有译和媄(jing)印版Q不知道l版否?br /> 3) “<a target="_blank" style="color: #ca0000; text-decoration: none; ">Introduction to Machine Learning</a>”<br />13、国外数据挖掘书c之Q?br /> 1) “<a target="_blank" style="color: #ca0000; text-decoration: none; ">Data.Mining.Concepts.and.Techniques.2nd</a>“Q数据挖掘经怹c?作?: Jiawei Han/Micheline Kamber 出版C?: Morgan Kaufmann 评语 : 华裔U学家写的书Q相当深入浅出?br /> 2) <a target="_blank" style="color: #ca0000; text-decoration: none; ">Data Mining:Practical Machine Learning Tools and Techniques</a><br /> 3) <a target="_blank" style="color: #ca0000; text-decoration: none; ">Beautiful Data: The Stories Behind Elegant Data Solutions</a>Q?Toby Segaran, Jeff HammerbacherQ?br />14、国外模式识别书c之Q?br /> 1Q?#8220;<a target="_blank" style="color: #ca0000; text-decoration: none; ">Pattern Recognition</a>”<br /> 2Q?#8220;<a target="_blank" style="color: #ca0000; text-decoration: none; ">Pattern Recongnition Technologies and Applications</a>”<br /> 3Q?#8220;<a target="_blank" style="color: #ca0000; text-decoration: none; ">An Introduction to Pattern Recognition</a>”<br /> 4Q?#8220;<a target="_blank" style="color: #ca0000; text-decoration: none; ">Introduction to Statistical Pattern Recognition</a>”<br /> 5Q?#8220;<a target="_blank" style="color: #ca0000; text-decoration: none; ">Statistical Pattern Recognition 2nd Edition</a>”<br /> 6Q?#8220;<a target="_blank" style="color: #ca0000; text-decoration: none; ">Supervised and Unsupervised Pattern Recognition</a>”<br /> 7Q?#8220;<a target="_blank" style="color: #ca0000; text-decoration: none; ">Support Vector Machines for Pattern Classification</a>”<br />15、国外h工智能书c之Q?br /> 1Q?a target="_blank" style="color: #ca0000; text-decoration: none; ">Artificial Intelligence: A Modern Approach</a> (2nd Edition) 人工领域无争议的l典?br /> 2Q?#8220;<a target="_blank" style="color: #ca0000; text-decoration: none; ">Paradigms of Artificial Intelligence Programming: Case Studies in Common LISP</a>”<br />16、其他相关书c:(x)<br /> 1Q?a target="_blank" style="color: #ca0000; text-decoration: none; ">Programming the Semantic Web</a>QToby Segaran , Colin Evans, Jamie Taylor<br /> 2Q?a target="_blank" style="color: #ca0000; text-decoration: none; ">Learning.PythonW四?/a>Q英?/p><p style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; font-family: Arial; line-height: 26px; text-align: left; ">二、课Ӟ(x)<br />1、哈工大刘挺老师?#8220;<a target="_blank" style="color: #ca0000; text-decoration: none; ">l计自然语言处理</a>”课gQ?br />2、哈工大刘秉权老师?#8220;<a target="_blank" style="color: #ca0000; text-decoration: none; ">自然语言处理</a>”课gQ?br />3、中U院计算所刘群老师?#8220;<a target="_blank" style="color: #ca0000; text-decoration: none; ">计算语言学讲?/a>“课gQ?br />4、中U院自动化所宗成?jin)老师?#8220;<a target="_blank" style="color: #ca0000; text-decoration: none; ">自然语言理解</a>”课gQ?br />5、北大常宝宝老师?#8220;<a target="_blank" style="color: #ca0000; text-decoration: none; ">计算语言?/a>”课gQ?br />6、北大詹卫东老师?#8220;<a target="_blank" style="color: #ca0000; text-decoration: none; ">中文信息处理基础</a>”的课件及(qing)相关代码Q?br />7、MIT Regina Barzilay教授?#8220;<a target="_blank" style="color: #ca0000; text-decoration: none; ">自然语言处理</a>”课gQ?2nlp上翻译了(jin)?章;<br />8、MIT大牛Michael Collins?#8220;<a target="_blank" style="color: #ca0000; text-decoration: none; ">Machine Learning Approaches for Natural Language Processing</a>(面向自然语言处理的机器学?fn)方?”课gQ?br />9、Michael Collins?#8220;<a target="_blank" style="color: #ca0000; text-decoration: none; ">Machine Learning </a>Q机器学?fn)?j)”课gQ?br />10、SMT牛hPhilipp Koehn “<a target="_blank" style="color: #ca0000; text-decoration: none; ">Advanced Natural Language Processing</a>Q高U自然语a处理Q?#8221;课gQ?br />11、Philipp Koehn “<a target="_blank" style="color: #ca0000; text-decoration: none; ">Empirical Methods in Natural Language Processing</a>”课gQ?br />12、Philipp Koehn“<a target="_blank" style="color: #ca0000; text-decoration: none; ">Machine Translation</a>Q机器翻译)(j)”课gQ?/p><p style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; font-family: Arial; line-height: 26px; text-align: left; ">三、语a资源和开源工P(x)<br />1、Brown语料库:(x)<br /> a) <a target="_blank" style="color: #ca0000; text-decoration: none; ">XML格式的brown语料?/a>Q带词性标注;<br /> b) <a target="_blank" style="color: #ca0000; text-decoration: none; ">普通文本格式的brown语料?/a>Q带词性标注;<br /> c) 合ƈq去除空行、行首空|用于词性标注训l:(x)<a target="_blank" style="color: #ca0000; text-decoration: none; ">browntest.zip</a><br />2?a target="_blank" style="color: #ca0000; text-decoration: none; ">NLTK官方提供的语料库资源列表</a><br />3?a target="_blank" style="color: #ca0000; text-decoration: none; ">OpenNLP上的开源自然语a处理工具列表</a><br />4、斯坦福大学自然语言处理l维护的“<a target="_blank" style="color: #ca0000; text-decoration: none; ">l计自然语言处理?qing)基于语料库的计语a学资源列?/a>”<br />5?a target="_blank" style="color: #ca0000; text-decoration: none; ">LDC上免费的中文信息处理资源</a><br />6、中文分词相兛_P(x)<br /> 1QJava版本的MMSEGQ?a target="_blank" style="color: #ca0000; text-decoration: none; ">mmseg-v0.3.zip</a>Q作者ؓ(f)sololQ详情可参见Q?a target="_blank" style="color: #ca0000; text-decoration: none; ">中文分词入门之篇?/a>?br /> 2Q张华^老师的ICTCLAS2010Q该版本非商用免费一q_(d)下蝲地址Q?br /><a target="_blank" style="color: #ca0000; text-decoration: none; ">http://cid-51de2738d3ea0fdd.skydrive.live.com/self.aspx/.Public/ICTCLAS2010-packet-release.rar</a><br />7、热?j)读?#8220;<a target="_blank" style="color: #ca0000; text-decoration: none; ">finallyliuyu</a>”提供的一Ҏ(gu)闻语料库Q包括腾讯,新浪Q网易,凤凰{,目前攑֜CSDN上:(x)<a target="_blank" style="color: #ca0000; text-decoration: none; ">http://finallyliuyu.download.csdn.net/</a><br />  另外finalllyliuyu?010q?月又提供?jin)一Ҏ(gu)本文c语料,详情见:(x)<a target="_blank" style="color: #ca0000; text-decoration: none; ">献给热衷于自然语a处理的业余爱好者的中文新闻分类语料库之?/a></p><p style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; font-family: Arial; line-height: 26px; text-align: left; ">四、文献:(x)<br />1、ACL-IJCNLP 2009论文全集Q?br /> a) <a target="_blank" style="color: #ca0000; text-decoration: none; ">大会(x)论文Full PaperW一?/a><br /> b) <a target="_blank" style="color: #ca0000; text-decoration: none; ">大会(x)论文Full PaperW二?/a><br /> c) <a target="_blank" style="color: #ca0000; text-decoration: none; ">大会(x)论文Short Paper合集</a><br /> d) <a target="_blank" style="color: #ca0000; text-decoration: none; ">ACL09之EMNLP-2009合集</a><br /> e) <a target="_blank" style="color: #ca0000; text-decoration: none; ">ACL09 所有workshop论文合集</a></p><img src ="http://www.shnenglu.com/sunrise/aggbug/169243.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.shnenglu.com/sunrise/" target="_blank">SunRise_at</a> 2012-03-28 09:35 <a href="http://www.shnenglu.com/sunrise/archive/2012/03/28/169243.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss> <footer> <div class="friendship-link"> <p>лǵվܻԴȤ</p> <a href="http://www.shnenglu.com/" title="精品视频久久久久">精品视频久久久久</a> <div class="friend-links"> </div> </div> </footer> <a href="http://www.dgtoygift.cn" target="_blank">97þùۺϾƷŮ</a>| <a href="http://www.jingxuan001.cn" target="_blank">91ƷѾþþþþþþ</a>| <a href="http://www.ghfsp.cn" target="_blank">þĻ</a>| <a href="http://www.ebianlian.cn" target="_blank">þþþþùƷŮ</a>| <a href="http://www.ruannews.com.cn" target="_blank">޾ƷƷþ99һ</a>| <a href="http://www.audividi.com.cn" target="_blank">Ʒ9999þþþ</a>| <a href="http://www.123oye.cn" target="_blank">ƷˮҾþþþþþ</a>| <a href="http://www.gofiv.cn" target="_blank">ĻþþƷAPP</a>| <a href="http://www.zheibvgsz.cn" target="_blank">ȫþվ</a>| <a href="http://www.fanwenku.com.cn" target="_blank">޹˾þһҳ</a>| <a href="http://www.woool-woool.com.cn" target="_blank">þþþ޾Ʒ</a>| <a href="http://www.hit5.cn" target="_blank">þۺϸۺϾþ</a>| <a href="http://www.9978217.cn" target="_blank">þþþAVƬ</a>| <a href="http://www.yonganwl.cn" target="_blank">þAVվ</a>| <a href="http://www.ccjump.cn" target="_blank">þAv뾫Ʒϵ</a>| <a href="http://www.a1dk.cn" target="_blank">ŷþþþþҹƷ</a>| <a href="http://www.xyq123.cn" target="_blank">99þóĻ</a>| <a href="http://www.fiyhigh.com.cn" target="_blank">þþþþþƵ</a>| <a href="http://www.galidun.cn" target="_blank">޹Ʒþþþ</a>| <a href="http://www.badnao.cn" target="_blank">Ʒþþþþ벻</a>| <a href="http://www.pkjx.net.cn" target="_blank">ھƷ99þ</a>| <a href="http://www.cooyu.cn" target="_blank">ľþþþר</a>| <a href="http://www.aikandianying.cn" target="_blank">ŮһaëƬþw</a>| <a href="http://www.lidonsj.cn" target="_blank">þþƷר</a>| <a href="http://www.alibabataba.cn" target="_blank">޹˾þһҳ</a>| <a href="http://www.gybyz.cn" target="_blank">Ʒþþþһ</a>| <a href="http://www.ihi7113575.cn" target="_blank">þer99ȾƷһ</a>| <a href="http://www.zhantu520.cn" target="_blank">þþ</a>| <a href="http://www.shiweey.cn" target="_blank">þ޹Ʒ</a>| <a href="http://www.jyran.cn" target="_blank">һ㽶þֻ</a>| <a href="http://www.sd2sc.com.cn" target="_blank">ɫɫۺϾþҹҹ</a>| <a href="http://www.4091.com.cn" target="_blank">þþù޾Ʒ</a>| <a href="http://www.zg-ly.cn" target="_blank">þþƷhþþƷ帣ӰԺ1421 </a>| <a href="http://www.7cdy.cn" target="_blank">һһþþƷۺ</a>| <a href="http://www.vlln.cn" target="_blank">ݺɫþþһ</a>| <a href="http://www.ytljc.cn" target="_blank">޷?Vþò</a>| <a href="http://www.i33b.cn" target="_blank">þþþƷҰ</a>| <a href="http://www.5490.com.cn" target="_blank">Ʒѿþþ㽶</a>| <a href="http://www.zhougong.net.cn" target="_blank">þøݾƷԴվ</a>| <a href="http://www.y6smog.cn" target="_blank">þþþAvר</a>| <a href="http://www.asox.cn" target="_blank">˾þþƷ</a>| <script> (function(){ var bp = document.createElement('script'); var curProtocol = window.location.protocol.split(':')[0]; if (curProtocol === 'https') { bp.src = 'https://zz.bdstatic.com/linksubmit/push.js'; } else { bp.src = 'http://push.zhanzhang.baidu.com/push.js'; } var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(bp, s); })(); </script> </body>