• <ins id="pjuwb"></ins>
    <blockquote id="pjuwb"><pre id="pjuwb"></pre></blockquote>
    <noscript id="pjuwb"></noscript>
          <sup id="pjuwb"><pre id="pjuwb"></pre></sup>
            <dd id="pjuwb"></dd>
            <abbr id="pjuwb"></abbr>

            Morya

            【翻譯】ANTLR 3

            ANTLR 3

            by
            R. Mark Volkmann, Partner/Software Engineer
            Object Computing, Inc. (OCI)
            翻譯者:Morya

            Preface

            前言

            ANTLR is a big topic, so this is a big article. The table of contents that follows contains hyperlinks to allow easy navigation to the many topics discussed. Topics are introduced in the order in which understanding them is essential to the example code that follows. Your questions and feedback are welcomed at mark@ociweb.com.

            ANTLR 是一個(gè)很大的話題,所以,這篇也有點(diǎn)長。 下面這個(gè)列表里面包含了一些鏈接,它們指向本頁其它位置,以方便瀏覽。 各個(gè)專題以一個(gè)在看過示例后能迅速理解的方式排列介紹。 有任何問題或者反饋,都?xì)g迎致信 mark@ociweb.com 。

            Table of Contents

            內(nèi)容列表


            Part I - Overview

            Introduction to ANTLR

            ANTLR is a free, open source parser generator tool that is used to implement "real" programming languages and domain-specific languages (DSLs). The name stands for ANother Tool for Language Recognition. Terence Parr, a professor at the University of San Francisco, implemented (in Java) and maintains it. It can be downloaded from http://www.antlr.org. This site also contains documentation, articles, examples, a Wiki and information about mailing lists.

            ANTLR 是一個(gè)免費(fèi),開源的解析器生成工具,它被用來實(shí)現(xiàn)“真正的”編程語言,和特殊語法語言(DSLs)。 ANTLR是 ANother Tool for Language Recognition 的縮寫。 圣弗朗西斯科大學(xué)教授Terence Parr,(用Java) 實(shí)現(xiàn)并維護(hù)著這個(gè)工具。 下載地址:http://www.antlr.org。 這個(gè)站點(diǎn)有相關(guān)的文檔、文章、示例,郵件列表,還有一個(gè)維基。

            ANTLR home page

            Many people feel that ANTLR is easier to use than other, similar tools. One reason for this is the syntax it uses to express grammars. Another is the existence of a graphical grammar editor and debugger called ANTLRWorks. Jean Bovet, a former masters student at the University of San Francisco who worked with Terence, implemented (using Java Swing) and maintains it.

            很多人都認(rèn)為 ANTLR 比同類工具更具可用性。 其中一個(gè)原因在于它描述 grammar 的語法。 另一個(gè)是圖形化的,可調(diào)試的 ANTLRWorks 文法編輯器的存在。 它由 Jean Bovet 使用 Java(Swing) 實(shí)現(xiàn)并維護(hù)。 他是在圣弗朗西斯科大學(xué)和Terence共事的一位former masters ?學(xué)生。

            A brief word about conventions in this article... ANTLR grammar syntax makes frequent use of the characters [ ] and { }. When describing a placeholder we will use italics rather than surrounding it with { }. When describing something that's optional, we'll follow it with a question mark rather than surrounding it with [ ].

            本文使用的標(biāo)記符轉(zhuǎn)換簡略介紹... ANTLR 文法文件的語法,對(duì) [ ] 和 { } 使用的比較頻繁。 當(dāng)描述一個(gè)占位符的時(shí)候,我們使用斜體字而不是把它用 { } 括起來。 描述可選部分的時(shí)候,我們使用 ? 后綴而不是用 [ ] 括起來。

            ANTLR Overview

            ANTLR uses Extended Backus-Naur (EBNF) grammars which can directly express optional and repeated elements. BNF grammars require a more verbose syntax to express these. EBNF grammars also support "subrules" which are parenthesized groups of elements.

            ANTLR 使用 Extended Backus-Naur 擴(kuò)展巴克斯標(biāo)記式 (EBNF) 文法,它可以直接表述 “可選”, “重復(fù)”元素。而 BNF 文法則需要更繁瑣的語法來表達(dá)。 EBNF 文法也支持括號(hào)包含的元素組的子規(guī)則。

            ANTLR supports infinite lookahead for selecting the rule alternative that matches the portion of the input stream being evaluated. The technical way of stating this is that ANTLR supports LL(*). An LL(k) parser is a top-down parser that parses from left to right, constructs a leftmost derivation of the input and looks ahead k tokens when selecting between rule alternatives. The * means any number of lookahead tokens. Another type of parser, LR(k), is a bottom-up parser that parses from left to right and constructs a rightmost derivation of the input. LL parsers can't handle left-recursive rules so those must be avoided when writing ANTLR grammars. Most people find LL grammars easier to understand than LR grammars. See Wikipedia for a more detailed descriptions of LL and LR parsers.

            當(dāng)多個(gè)規(guī)則符合輸入的一部分內(nèi)容時(shí),ANTLR 支持無窮前看,以消除歧義。 用術(shù)語說就是 ANTLR 支持 LL(*)。 LL(k) parser 是一個(gè)自頂向下 parser ,它從左到右解析, 構(gòu)建一個(gè)輸入的最左推導(dǎo),當(dāng)遇到多個(gè)規(guī)則選擇時(shí),前看n個(gè)詞素來決定。 * 代表前看任一個(gè)詞素。 另一種類型的parser,LR(k),自底向上 parser,從左到右解析,并且構(gòu)建 一個(gè)輸入的最右推導(dǎo)。 LL parsers 不能處理左遞歸規(guī)則,在寫Antlr文法的時(shí)一定要避免。 多數(shù)人覺得 LL 文法比 LR 文法更容易理解。詳細(xì)參考:維基百科 LLLR 解析器。

            ANTLR supports three kinds of predicates that aid in resolving ambiguities. These allow rules that are not based strictly on input syntax.

            ANTLR 支持三種斷言來解決歧義。 它們?cè)试S不是嚴(yán)格基于輸入語法的規(guī)則。

            While ANTLR is implemented in Java, it generates code in many target languages including Java, Ruby, Python, C, C++, C# and Objective C.

            雖然 ANTLR 使用 Java 編寫,它支持多種目標(biāo)語言,包括 Java, Ruby, Python, C, C++, C# 和 Objective C。

            There are IDE plug-ins available for working with ANTLR inside IDEA and Eclipse, but not yet for NetBeans or other IDEs.

            IDEA 和 Eclipse 有相關(guān)的插件來支持 ANTLR,NetBeans 等 IDE 暫時(shí)還沒有。

            Use Cases

            用例?

            There are three primary use cases for ANTLR.

            ANTLR 有三種主要的使用方法。

            The first is implementing "validators." These generate code that validates that input obeys grammar rules.

            第一種是實(shí)現(xiàn)“驗(yàn)證器”。 它檢驗(yàn)輸入文本是否符合文法規(guī)定的規(guī)則。

            The second is implementing "processors." These generate code that validates and processes input. They can perform calculations, update databases, read configuration files into runtime data structures, etc. Our Math example coming up is an example of a processor.

            第二種是實(shí)現(xiàn) “處理器”。 它檢驗(yàn)并處理輸入文本。 可以進(jìn)行計(jì)算,更新數(shù)據(jù)庫,讀取配置文件到內(nèi)存中,等。 我們后面的Math示例就是一個(gè)處理器的例子。

            The third is implementing "translators." These generate code that validates and translates input into another format such as a programming language or bytecode.

            第三種就是“翻譯器”。 它驗(yàn)證輸入,并將輸入翻譯成另一種格式,比如編程語言或字節(jié)碼。

            Later we'll discuss "actions" and "rewrite rules." It's useful to point out where these are used in the three use cases above. Grammars for validators don't use actions or rewrite rules. Grammars for processors use actions, but not rewrite rules. Grammars for translators use actions (containing printlns) and/or rewrite rules.

            晚點(diǎn)我們會(huì)討論 “動(dòng)作” 和 “重寫規(guī)則”。 當(dāng)然,明確這三種在何種情況使用會(huì)利于理解。 驗(yàn)證器不使用 “動(dòng)作”和 “重寫規(guī)則”。 處理器使用“動(dòng)作”,但不使用“重寫規(guī)則”。 翻譯器使用“動(dòng)作”,可能使用“重寫規(guī)則”。 (包含 printlns)

            Other DSL Approaches

            Dynamic languages like Ruby and Groovy can be used to implement many DSLs. However, when they are used, the DSLs have to live within the syntax rules of the language. For example, such DSLs often require dots between object references and method names, parameters separated by commas, and blocks of code surrounded by curly braces or do/end keywords. Using a tool like ANTLR to implement a DSL provides maximum control over the syntax of the DSL.

            待翻譯…… 我也不懂……

            Definitions

            Lexer
            converts a stream of characters to a stream of tokens (ANTLR token objects know their start/stop character stream index, line number, index within the line, and more)
            把字符流轉(zhuǎn)換成詞素流, (ANTLR 詞素對(duì)性知道它們自己的 start/stop 索引,行號(hào),行中位置,等)
            Parser
            processes a stream of tokens, possibly creating an AST
            處理詞素流輸入,并生成AST(可選)
            Abstract Syntax Tree (AST)
            an intermediate tree representation of the parsed input that is simpler to process than the stream of tokens and can be efficiently processed multiple times
            輸入流的樹表示格式,它比詞素流更方便處理,且可以高效的多次處理。
            Tree Parser
            processes an AST
            StringTemplate
            a library that supports using templates with placeholders for outputting text (ex. Java source code)
            一個(gè)支持占位符模板的庫,用來輸出文本(比如 Java 源文件)

            An input character stream is feed into the lexer. The lexer converts this to a stream of tokens that is feed to the parser. The parser often constructs an AST which is fed to the tree parser. The tree parser processes the AST and optionally produces text output, possibly using StringTemplate.

            字符流送到 lexer,lexer 將它們轉(zhuǎn)換到詞素流,然后送到 parser。 paser 常常建立一個(gè) AST ,并送到tree parser。 tree paser 處理 AST ,可能還會(huì)使用字符串模板生成文本輸出。

            ANTLR flow

            General Steps

            The general steps involved in using ANTLR include the following.

            使用 ANTLR 大致有以下幾步。

            • Write the grammar using one or more files.
              A common approach is to use three grammar files, each focusing on a specific aspect of the processing. The first is the lexer grammar, which creates tokens from text input. The second is the parser grammar, which creates an AST from tokens. The third is the tree parser grammar, which processes an AST. This results in three relatively simple grammar files as opposed to one complex grammar file.
            • 把文法定義寫入一個(gè)或多個(gè)文件中
              一個(gè)通常的做法是使用三個(gè)文法文件,每個(gè)單獨(dú)處理一個(gè)方面。 第一個(gè)是掃描器定義,從字符輸入建立詞素流。 第二個(gè)是解析器定義,從詞素流建立AST。 第三個(gè)是樹解析器定義,處理AST輸入。 這樣會(huì)產(chǎn)生三個(gè)相關(guān)的,但簡化的文法文件,而不是一個(gè)單獨(dú)的復(fù)雜大文件。
            • Optionally write StringTemplate templates for producing output.
            • 【可選】為輸出編寫字符串模板
            • Debug the grammar using ANTLRWorks.
            • 使用 ANTLRWorks 調(diào)試文法
            • Generate classes from the grammar. These validate that text input conforms to the grammar and execute target language "actions" specified in the grammar.
            • 從文法定義生成相關(guān)的處理類 這些類會(huì)驗(yàn)證輸入文本是否符合文法定義并執(zhí)行目標(biāo)語言文法中指定的“動(dòng)作”。
            • Write an application that uses the the generated classes.
            • 使用生成的類完成完整程序。
            • Feed the application text that conforms to the grammar.
            • 給程序輸入符合文法的文件。

            Part II - Jumping In

            Example Description

            Enough background information, let's create a language!

            現(xiàn)在已經(jīng)有足夠的背景信息,我們來創(chuàng)建一個(gè)語言吧!

            Here's a list of features we want our language to have:

            下面是對(duì)我們的語言的功能期待:

            • run on a file or interactively
            • 可以執(zhí)行一個(gè)文件或者交互
            • get help using ? or help
            • 可以用 ?help 取得幫助
            • support a single data type: double
            • 支持double數(shù)據(jù)類型
            • assign values to variables using syntax like a = 3.14
            • 可以使用類似 a = 3.14 的語法對(duì)一個(gè)變量進(jìn)行賦值
            • define polynomial functions using syntax like f(x) = 3x^2 - 4x + 2
            • 使用如下語法定義一個(gè)多項(xiàng)式函數(shù) f(x) = 3x^2 - 4x + 2
            • print strings, numbers, variables and function evaluations using syntax like
              print "The value of f for " a " is " f(a)
            • 使用如下語法打印字符串,數(shù)值,變量,函數(shù)的求值
              print "The value of f for " a " is " f(a)
            • print the definition of a function and its derivative using syntax like
              print "The derivative of " f() " is " f'()
            • 使用如下語法打印一個(gè)函數(shù)的定義和導(dǎo)函數(shù)
              print "The derivative of " f() " is " f'()
            • list variables and functions using list variables and list functions
            • 使用如下語法列表變量和函數(shù) list variableslist functions
            • add and subtract functions using syntax like h = f - g
              (note that the variables used in the functions do not have to match)
            • 使用如下的語法進(jìn)行函數(shù)的加減 h = f - g
              (注意,函數(shù)中使用的變量不需要是一個(gè)變量實(shí)例)
            • exit using exit or quit
            • 使用 exitquit 退出程序

            Here's some example input.

            下面是輸入示例

             

            = 3.14
            f(x) 
            = 3x^2 - 4x + 2
            print "The value of f for " a " is " f(a)
            print "The derivative of " f() " is " f'()
            list variables
            list functions
            g(y) 
            = 2y^3 + 6y - 5
            = f + g
            print h()

             

            Here's the output that would be produced.

            下面是理論輸出

            The value of f for 3.14 is 19.0188
            The derivative of f(x) 
            = 3x^2 - 4x + 2 is f'(x) = 6x - 4
            # of variables defined: 1
            = 3.14
            # of functions defined: 
            1
            f(x) 
            = 3x^2 - 4x + 2
            h(x) 
            = 2x^3 + 3x^2 + 2x - 3

            Here's the AST we'd like to produce for the input above, drawn by ANTLRWorks. It's split into three parts because the image is really wide. The "nil" root node is automatically supplied by ANTLR. Note the horizontal line under the "nil" root node that connects the three graphics. Nodes with uppercase names are "imaginary nodes" added for the purpose of grouping other nodes. We'll discuss those in more detail later.

            這里是我們將會(huì)從輸入生成的AST,使用 ANTLRWorks 繪出。 它被分成三塊,因?yàn)閳D片太寬了。 根節(jié)點(diǎn) "nil" 由 ANTLR 自動(dòng)提供。 注意,根節(jié)點(diǎn) "nil" 下面的豎線是連著三張圖片的………… 名字大寫的節(jié)點(diǎn)是 “虛擬節(jié)點(diǎn)”。 為把其它節(jié)點(diǎn)分類而加。 我們晚點(diǎn)會(huì)詳細(xì)討論。


            example ast diagram, part 1


            example ast diagram, part 2


            example ast diagram, part 3

            Important Classes

            The diagram below shows the relationships between the most important classes used in this example.

            下面的圖片展示了本示例中用到的幾個(gè)最重要的類之間的關(guān)系。

            Important classes

            Note the key in the upper-left corner of the diagram that distinguishes between classes provided by ANTLR, classes generated from our grammar by ANTLR, and classes we wrote manually.

            注意,在圖片左上角的 key,指出了 ANTLR 提供的類,文法定義自動(dòng)生成的類,和我們自己寫的類。

            Grammar Syntax

            The syntax of an ANTLR grammar is described below.

            ANTLR grammar 文件的語法在下面描述

            grammar-type? grammar grammar-name;
            grammar-options?
            token-spec?
            attribute-scopes?
            grammar-actions?
            rule+

            Comments in an ANTLR grammar use the same syntax as Java. There are three types of grammars: lexer, parser and tree. If a grammar-type isn't specified, it defaults to a combined lexer and parser. The name of the file containing the grammar must match the grammar-name and have a ".g" extension. The classes generated by ANTLR will contain a method for each rule in the grammar. Each of the elements of grammar syntax above will be discussed in the order they are needed to implement our Math language.

            ANTLR grammar 使用和Java相同的注釋語法。( /* bb */ //aa ) grammar有三種: lexer, parser 和 tree。 如果為指定 grammar-type ,默認(rèn)為 lexer 和 parser 混合 grammar 。 包含 grammar 的文件,其文件名必須和 grammar-name 完全一致(注意大小寫), 而且,擴(kuò)展名為 ".g" 。 ANTLR 生成的類會(huì)為文法里的每一個(gè)規(guī)則生成一個(gè)對(duì)應(yīng)的函數(shù)。 上面討論的語法元素,在需要實(shí)現(xiàn)Math語言而用到的時(shí)候,會(huì)引入詳細(xì)說明。(唉,組織不好)

            Grammar Options

            Grammar options include the following:

            grammar option包含下面幾個(gè):

            AST node type - ASTLabelType = CommonTree
            This is used in grammars that create or parse ASTs. CommonTree is a provided class. It is also possible to use your own class to represent AST nodes.
            在生成或解析 AST 的 grammars 中使用。 CommonTree 是 ANTLR 內(nèi)置的一個(gè)類。 也可以使用自定義的類來表述 AST 節(jié)點(diǎn)。
            infinite lookahead - backtrack = true
            無限前看 - backtrack = true
            This provides infinite lookahead for all rules. Parsing is slower with this on.
            對(duì)所有的規(guī)則提供無限前看。 開啟后解析速度會(huì)變慢。
            limited lookahead - k = integer
            有限前看 - k = integer
            This limits lookahead to a given number of tokens.
            設(shè)定前看 k 個(gè)詞素
            output type - output = AST | template
            輸出類型 - output = AST | template
            Choose template when using the StringTemplate library.
            Don't set this when not producing output or doing so with printlns in actions.
            使用StringTemplate的話,就選擇 template
            如果不進(jìn)行輸出,或者使用 println 進(jìn)行輸出就不要設(shè)置此選項(xiàng)。
            token vocabulary - tokenVocab = grammar-name
            This allows one grammar file to use tokens defined in another (with lexer rules or a token spec.). It reads generated .token files.
            允許讀取另一個(gè) grammar 文件定義的詞素。 (with lexer rules or a token spec.). 它讀取生成的 .token 文件。

            Grammar options are specified using the following syntax. Note that quotes aren't needed around single word values.

            Grammar option 使用如下語法來指定。 注意,引用(我也不曉得)不需要用單引號(hào)包含。

             

            options {
              name 
            = 'value';
              ...
            }

             

            Grammar Actions

            Grammar actions add code to the generated code. There are three places where code can be added by a grammar action.

            Grammar aciton 會(huì)在生成的代碼里添加一些內(nèi)容。 用 grammar action 可以添加代碼到三個(gè)地方。

            1. Before the generated class definition:
              This is commonly used to specify a Java package name and import classes in other packages. The syntax for adding code here is @header { ... }. In a combined lexer/parser grammar, this only affects the generated parser class. To affect the generated lexer class, use @lexer::header { ... }.
            2. 在生成類的定義位置前:
              通常用于指定 Java package 或者 import 其它 package 里的 class。 使用語法 @header { ... }。 在混合的 lexer/parser grammar 內(nèi),這樣只會(huì)影響生成的parser類。 要對(duì) lexer 類也起作用,需要使用 @lexer::header { ... }
            3. Inside the generated class definition:
              This is commonly used to define constants, attributes and methods accessible to all rule methods in the generated classes. It can also be used to override methods in the superclasses of the generated classes.
              The syntax for adding code here is @members { ... }. In a combined lexer/parser grammar, this only affects the generated parser class. To affect the generated lexer class, use @lexer::members { ... }.
            4. 在生成類的定義內(nèi):
              通常用于定義常數(shù),屬性或者一些可以訪問所有 rule 生成函數(shù)的方法。 也可以用來重載生成類超類的函數(shù)。
              使用語法 @members { ... }。 在混合的 lexer/parser grammar 內(nèi),這樣只會(huì)影響生成的parser類。 要對(duì) lexer 類也起作用,需要使用 @lexer::members { ... }
            5. Inside generated methods:
              The catch blocks for the try block in the methods generated for each rule can be customized. One use for this is to stop processing after the first error is encountered rathering than attempting to recover by skipping unrecognized tokens.
              The syntax for adding catch blocks is @rulecatch { catch-blocks }
            6. 在生成的函數(shù)內(nèi):
              每個(gè)規(guī)則函數(shù)的異常處理塊都可以自定義。 其中一個(gè)用處就是,當(dāng)遇到第一個(gè)錯(cuò)誤的時(shí)候,就停止,而不是掠過無法識(shí)別的詞素嘗試恢復(fù)。
              使用語法 @rulecatch { catch-blocks }

            Part III - Lexers

            Lexer Rules

            A lexer rule or token specification is needed for every kind of token to be processed by the parser grammar. The names of lexer rules must start with an uppercase letter and are typically all uppercase. A lexer rule can be associated with:

            一個(gè) parser 需要處理的 token 必須要有相應(yīng)的規(guī)則或 token 規(guī)格。 laxer 規(guī)則名必須以大寫字母開頭,最好是按慣例全部大寫。 一個(gè) lexer 規(guī)則可有以下幾種

            • a single literal string expected in the input
            • 期待一個(gè)單獨(dú)的文本字符串
            • a selection of literal strings that may be found
            • 可選的文本字符串
            • a sequence of specific characters and ranges of characters using the cardinality indicators ?, * and +
            • 一串特定順序的字符范圍和數(shù)量指定符 ?, *,+

            A lexer rule cannot be associated with a regular expression.

            lexer rule 不能用 正則表達(dá)式 指定。

            When the lexer chooses the next lexer rule to apply, it chooses the one that matches the most characters. If there is a tie then the one listed first is used, so order matters.

            當(dāng) lexer 發(fā)現(xiàn)輸入流符合同時(shí)符合兩個(gè) rule,它會(huì)選擇匹配更長的。 如果有兩個(gè)都匹配輸入,先出現(xiàn)在 grammar 中的會(huì)被匹配。所以,順序前后是有差別的。

            A lexer rule can refer to other lexer rules. Often they reference "fragment" lexer rules. These do not result in creation of tokens and are only present to simplify the definition of other lexer rules. In the example ahead, LETTER and DIGIT are fragment lexer rules.

            lexer rule 可以引用其它 lexer rule。 經(jīng)常它們引用 "fragment" lexer rule。 那些 rule 不產(chǎn)生實(shí)際的 token 標(biāo)記,它們只是為了簡化定義 lexer rule 而存在。 在前面的 example,LETTER 和 DIGIT 是 fragment lexer rule。

            Whitespace and Comments

            Whitespace and comments in the input are handled in lexer rules. There are two common options for handling these: either throw them away or write them to a different "channel" that is not automatically inspected by the parser. To throw them away, use "skip();". To write them to the special "hidden" channel, use "$channel = HIDDEN;".

            輸入中的空白字符和注釋在 lexer rule 里處理。 有兩個(gè)常用的 option 來處理這些: 干脆丟棄,或者輸入到另一個(gè) parser 不會(huì)自動(dòng)關(guān)注的 "channel"。 丟棄的話,使用 "skip();" 寫入 "hidden" 通道: "$channel = HIDDEN;" 。

            Here are examples of lexer rules that handle whitespace and comments.

            下面是一些處理空白和注釋的 lexer 規(guī)則例子。

             

            // Send runs of space and tab characters to the hidden channel.
            WHITESPACE: (' ' | '\t')+ { $channel 
            = HIDDEN; };
            // Treat runs of newline characters as a single NEWLINE token.
            // On some platforms
            , newlines are represented by a \n character.
            // On others they are represented by a \r and a \n character.
            NEWLINE: ('\r'? '\n')+
            ;
            // Single-line comments begin with //, are followed by any characters
            // other than those in a newline
            , and are terminated by newline characters.
            SINGLE_COMMENT: '//' ~('\r' | '\n')* NEWLINE { skip()
            ; };
            // Multi-line comments are delimited by /* and */
            // and are optionally followed by newline characters.
            MULTI_COMMENT options { greedy 
            = false; }
            : '/*' .* '*/' NEWLINE? { skip(); };

             

            When the greedy option is set to true, the lexer matches as much input as possible. When false, it stops when input matches the next element in the lexer rule. The greedy option defaults to true except when the patterns ".*" and ".+" are used. For this reason, it didn't need to be specified in the example above.

            當(dāng)開啟貪婪模式的時(shí)候, lexer 會(huì)盡可能多的匹配字符。 關(guān)閉時(shí),找到 lexer rule 里規(guī)定的下一個(gè)元素就會(huì)停止。 貪婪模式默認(rèn)開啟,除非使用了 ".*" 和 ".+" 模式。 因此,在上面的例子中,不需要再特別指定。

            If newline characters are to be used as statement terminators then they shouldn't be skipped or hidden since the parser needs to see them.

            如果換行作為描述塊的結(jié)束符,不要丟棄或隱藏它,因?yàn)?parser 需要用到。

            Our Lexer Grammar

            lexer grammar MathLexer;
            // We want the generated lexer class to be in this package.
            @header { package com.ociweb.math; }
            APOSTROPHE: 
            '\''; // for derivative
            ASSIGN: '=';
            CARET: 
            '^'// for exponentiation
            FUNCTIONS: 'functions'// for list command
            HELP: '?' | 'help';
            LEFT_PAREN: 
            '(';
            LIST: 
            'list';
            PRINT: 
            'print';
            RIGHT_PAREN: 
            ')';
            SIGN: 
            '+' | '-';
            VARIABLES: 
            'variables'// for list command
            NUMBER: INTEGER | FLOAT;
            fragment FLOAT: INTEGER 
            '.' '0'..'9'+;
            fragment INTEGER: 
            '0' | SIGN? '1'..'9' '0'..'9'*;
            NAME: LETTER (LETTER 
            | DIGIT | '_')*;
            STRING_LITERAL: 
            '"' NONCONTROL_CHAR* '"';
            fragment NONCONTROL_CHAR: LETTER 
            | DIGIT | SYMBOL | SPACE;
            fragment LETTER: LOWER 
            | UPPER;
            fragment LOWER: 
            'a'..'z';
            fragment UPPER: 
            'A'..'Z';
            fragment DIGIT: 
            '0'..'9';
            fragment SPACE: 
            ' ' | '\t';
            // Note that SYMBOL does not include the double-quote character.
            fragment SYMBOL: '!' | '#'..'/' | ':'..'@' | '['..'`' | '{'..'~';
            // Windows uses \r\n. UNIX and Mac OS X use \n.
            // To use newlines as a terminator,
            // they can't be written to the hidden channel!
            NEWLINE: ('\r'? '\n')+;
            WHITESPACE: SPACE
            + { $channel = HIDDEN; };

            We'll be looking at the parser grammar soon. When parser rule alternatives contain literal strings, they are converted into references to automatically generated lexer rules. For example, we could eliminate the ASSIGN lexer rule above and change ASSIGN to '=' in the parser grammar.

            我們馬上就會(huì)看到 parser 的 grammar 。 當(dāng) parser 的 規(guī)則中包含文本字符串時(shí),它們會(huì)被自動(dòng)生成轉(zhuǎn)換成 lexer rule ,然后再引用它。 比如,我們可以把上面例子里的 ASSIGN 刪掉,然后把 parser grammar 里的 ASSIGN 換成 '='


            Part IV - Parsers

            Token Specifications

            The lexer creates tokens for all input character sequences that match lexer rules. It can be useful to create other tokens that either don't exist in the input (imaginary) or have a better name than what is found in the input. Imaginary tokens are often used to group other tokens. In the parser grammar ahead, the tokens that play this role are DEFINE, POLYNOMIAL, TERM, FUNCTION, DERIVATIVE and COMBINE.

            Lexer 會(huì)為所有符合 lexer rule 的輸入字符流創(chuàng)建詞素。 下面的做法會(huì)很有用處,為輸入中并不存在的單元?jiǎng)?chuàng)建詞素,或使用比輸入中出現(xiàn)的更好理解的名字。 虛擬詞素經(jīng)常被用來合并其它的詞素。 在后面的 parser grammar 中,扮演這種角色的有 DEFINE, POLYNOMIAL, TERM, FUNCTION, DERIVATIVE 和 COMBINE。

            The syntax for specifying these kinds of tokens in a parser grammar is:

            在 parser grammar 中指定這種類型的詞素的語法如下:

             

            tokens {
              imaginary
            -name;
              better
            -name = 'input-name';
            }

             

            Rule Syntax

            The syntax for defining rules is

            定義一個(gè) rule 的語法

            fragment? rule-name arguments?
            (returns 
            return-values)?
            throws-spec?
            rule
            -options?
            rule
            -attribute-scopes?
            rule
            -actions?
              : token
            -sequence-1
              
            | token-sequence-2
              ...
              ;
            exceptions
            -spec?

            The fragment keyword only appears at the beginning of lexer rules that are used as fragments (described earlier).

            關(guān)鍵詞 fragment 只會(huì)出現(xiàn)在 lexer 規(guī)則前面,它們會(huì)被作為 fragment (前面描述過了)。

            Rule options include backtrack and k which customize those options for a specific rule instead of using the grammar-wide values specified as grammar options. They are specified using the syntax options { ... }.

            規(guī)則配置里面包含 backtrackk 會(huì)把它們指定的規(guī)則 rule 用當(dāng)前值來替代全局 grammar 級(jí)指定的值。它們使用下面的語法來指定 options { ... }

            The token sequences are alternatives that can be selected by the rule. Each element in the sequences can be followed by an action which is target language code (such as Java) in curly braces. The code is executed immediately after a preceding element is matched by input.

            詞素順序列被用來選擇各個(gè)規(guī)則。序列中的每一個(gè)元素都可以附加一個(gè) action,它被用{ } 包圍的目標(biāo)語言(比如 Java)編寫。在之前的元素被輸入流匹配之后該代碼會(huì)立即被執(zhí)行。

            The optional exceptions-spec customizes exception handling for this rule.

            可選的 exceptions-spec 此規(guī)則的異常處理。

            Elements in a token sequence can be assigned to variables so they can be accessed in actions. To obtain the text value of a token that is referred to by a variable, use $variable.text. There are several examples of this in the parser grammar that follows.

            一個(gè)詞素中的元素可以被賦值給變量以利于它們可以被 action 訪問。要讀取一個(gè)變量指向詞素的文本值,要使用 $variable.text。后面的幾個(gè)例子中都有這種使用情況。

            Creating ASTs

            Parser grammars often create ASTs. To do this, the grammar option output must be set to AST.

            Parser grammar 經(jīng)常創(chuàng)建 AST。這樣的話,grammar option output 必須被設(shè)置成 AST

            There are two approaches for creating ASTs. The first is to use "rewrite rules". These appear after a rule alternative. This is the recommended approach in most cases. The syntax of a rewrite rule is
            -> ^(parent child-1 child-2 ... child-n)

            有兩種辦法來創(chuàng)建 AST。第一種是使用 "rewrite rules"。這種出現(xiàn)在一個(gè)規(guī)則的分支后面。這是大多數(shù)情況的推薦做法。Rewrite rule 語法 -> ^(parent child-1 child-2 ... child-n)

            The second approach for creating ASTs is to use AST operators. These appear in a rule alternative, immediately after tokens. They work best for sequences like mathematical expressions. There are two AST operators. When a ^ is used, a new root node is created for all child nodes at the same level. When a ! is used, no node is created. This is often used for bits of syntax that aren't needed in the AST such as parentheses, commas and semicolons. When a token isn't followed by one of them, a new child node is created for that token using the current root node as its parent.

            第二種創(chuàng)建AST語法樹的辦法是使用 AST 操作符。它們出現(xiàn)在一個(gè)規(guī)則的分支中間,緊跟詞素后面。在數(shù)學(xué)表達(dá)式中最適用。 有兩種 AST 操作符。使用 ^ 時(shí),一個(gè)新的父(?不確定)節(jié)點(diǎn)會(huì)被創(chuàng)建給所用同一等級(jí)的子節(jié)點(diǎn)。使用 ! 時(shí),不創(chuàng)建節(jié)點(diǎn)。這種經(jīng)常用在一些不需要出現(xiàn)在 AST 中的語法片段,比如括號(hào)、逗號(hào)、和分號(hào)。當(dāng)一個(gè)詞素后面沒有跟任何后綴時(shí),默認(rèn)創(chuàng)建一個(gè)當(dāng)前根節(jié)點(diǎn)的子節(jié)點(diǎn)。

            A rule can use both of these approaches, but each rule alternative can only use one approach.

            一個(gè)規(guī)則可以同時(shí)使用兩種方法,但對(duì)于一個(gè)規(guī)則分支只能使用一種。

            Rule Arguments and Return Values

            The following syntax is used to declare rule arguments and return types.

            下面的語法用來聲明規(guī)則參數(shù)和返回值。

            rule-name[type1 name1, type2 name2, ...]
            returns [type1 name1, type2 name2, ...] :
              ...
              ;

            The names after the rule name are arguments and the names after the returns keyword are return values.

            規(guī)則名后面的名字是參數(shù)類型和參數(shù)變量名,而 returns 關(guān)鍵字后面的則是返回值類型和返回值變量名。

            Note that rules can return more than one value. ANTLR generates a class to use as the return type of the generated method for the rule. Instances of this class hold all the return values. The generated method name matches the rule name. The name of the generated return type class is the rule name with "_return" appended.

            注意,規(guī)則可以返回多個(gè)值。 ANTLR 會(huì)生成一個(gè)類用來做該規(guī)則生成方法的返回值類型。該類的實(shí)例則持有所有返回值。規(guī)則生成的方法名,或稱函數(shù)名和該規(guī)則名完全一致。 而生成的返回值類型名則會(huì)是規(guī)則名后加上"_return"。

            我們的解析器定義如下

            parser grammar MathParser;
            options {
              
            // We're going to output an AST.
              output = AST;
              
            // We're going to use the tokens defined in our MathLexer grammar.
              tokenVocab = MathLexer;
            }
            // These are imaginary tokens that will serve as parent nodes
            // for grouping other tokens in our AST.
            tokens {
              COMBINE;
              DEFINE;
              DERIVATIVE;
              FUNCTION;
              POLYNOMIAL;
              TERM;
            }
            // We want the generated parser class to be in this package.
            @header { package com.ociweb.math; }
            // This is the "start rule".
            // EOF is a predefined token that represents the end of input.
            // The "start rule" should end with this.
            // Note the use of the ! AST operator
            // to avoid adding the EOF token to the AST.
            script: statement* EOF!;
            statement: assign 
            | define | interactiveStatement | combine | print;
            // These kinds of statements only need to be supported
            // when reading input from the keyboard.
            interactiveStatement: help | list;
            // Examples of input that match this rule include
            // "a = 19", "a = b", "a = f(2)" and "a = f(b)".
            assign: NAME ASSIGN value terminator -> ^(ASSIGN NAME value);
            value: NUMBER 
            | NAME | functionEval;
            // A parenthesized group in a rule alternative is called a "subrule".
            // Examples of input that match this rule include "f(2)" and "f(b)".
            functionEval
              : fn
            =NAME LEFT_PAREN (v=NUMBER | v=NAME) RIGHT_PAREN -> ^(FUNCTION $fn $v);
            // EOF cannot be used in lexer rules, so we made this a parser rule.
            // EOF is needed here for interactive mode where each line entered ends in EOF
            // and for file mode where the last line ends in EOF.
            terminator: NEWLINE | EOF;
            // Examples of input that match this rule include
            // "f(x) = 3x^2 - 4" and "g(x) = y^2 - 2y + 1".
            // Note that two parameters are passed to the polynomial rule.
            define
              : fn
            =NAME LEFT_PAREN fv=NAME RIGHT_PAREN ASSIGN
            polynomial[$fn.text, $fv.text] terminator
            -> ^(DEFINE $fn $fv polynomial);
            // Examples of input that match this rule include
            // "3x2 - 4" and "y^2 - 2y + 1".
            // fnt = function name text; fvt = function variable text
            // Note that two parameters are passed in each invocation of the term rule.
            polynomial[String fnt, String fvt]
              : term[$fnt, $fvt] (SIGN term[$fnt, $fvt])
            *
            -> ^(POLYNOMIAL term (SIGN term)*);
            // Examples of input that match this rule include
            // "4", "4x", "x^2" and "4x^2".
            // fnt = function name text; fvt = function variable text
            term[String fnt, String fvt]
              
            // tv = term variable
              : c=coefficient? (tv=NAME e=exponent?)?
                
            // What follows is a validating semantic predicate.
                
            // If it evaluates to false, a FailedPredicateException will be thrown.
                
            // It is testing whether the term variable matches the function variable.
                { tv == null ? true : ($tv.text).equals($fvt) }?
                
            -> ^(TERM $c? $tv? $e?)
               ;
               
            // This catches bad function definitions such as
               
            // f(x) = 2y
               catch [FailedPredicateException fpe] {
                 String tvt 
            = $tv.text;
                 String msg 
            = "In function \"" + fnt +
                 "\" the term variable \"" + tvt +
                 
            "\" doesn't match function variable \"" + fvt + "\".";
                 throw new RuntimeException(msg);
               }
            coefficient: NUMBER;
            // An example of input that matches this rule is "^2".
            exponent: CARET NUMBER -> NUMBER;
            // Inputs that match this rule are "?" and "help".
            help: HELP terminator -> HELP;
            // Inputs that match this rule include
            // "list functions" and "list variables".
            list
              : LIST listOption terminator 
            -> ^(LIST listOption);
            // Inputs that match this rule are "functions" and "variables".
            listOption: FUNCTIONS | VARIABLES;
            // Examples of input that match this rule include
            // "h = f + g" and "h = f - g".
            combine
              : fn1
            =NAME ASSIGN fn2=NAME op=SIGN fn3=NAME terminator
            -> ^(COMBINE $fn1 $op $fn2 $fn3);
            // An example of input that matches this rule is
            // print "f(" a ") = " f(a)
            print
              : PRINT printTarget
            * terminator -> ^(PRINT printTarget*);
            // Examples of input that match this rule include
            // 19, 3.14, "my text", a, f(), f(2), f(a) and f'().
            printTarget
              : NUMBER 
            -> NUMBER
              
            | sl=STRING_LITERAL -> $sl
              
            | NAME -> NAME
              
            // This is a function reference to print a string representation.
              | NAME LEFT_PAREN RIGHT_PAREN -> ^(FUNCTION NAME)
              
            | functionEval
              
            | derivative
              ;
            // An example of input that matches this rule is "f'()".
            derivative
              : NAME APOSTROPHE LEFT_PAREN RIGHT_PAREN 
            -> ^(DERIVATIVE NAME);

            Part V - Tree Parsers

            Part V - 樹解析器

            Rule Actions

            Rule actions add code before and/or after the generated code in the method generated for a rule. They can be used for AOP-like wrapping of methods. The syntax @init { ...code... } inserts the contained code before the generated code. The syntax @after { ...code... } inserts the contained code after the generated code. The tree grammar rules polynomial and term ahead demonstrate using @init.

            Rule Action 可以在自動(dòng)生成的規(guī)則函數(shù)代碼前或者后面加入自定義代碼。 可以被用在 AOP 類似的包裝函數(shù)中。 下面的語法 @init { ...code... } 會(huì)在生成代碼前面加入其中包含的代碼。 語法 @after { ...code... } 則會(huì)在生成代碼后面加上其中包含的代碼。 前面的樹文法規(guī)則 polynomialterm 演示了 @init 的使用。

            Attribute Scopes

            屬性域

            Data is shared between rules in two ways: by passing parameters and/or returning values, or by using attributes. These are the same as the options for sharing data between Java methods in the same class. Attributes can be accessible to a single rule (using @init to declare them), a rule and all rules invoked by it (rule scope), or by all rules that request the named global scope of the attributes.

            有兩種辦法在規(guī)則之間共享數(shù)據(jù):傳遞參數(shù)或者返回值,或使用屬性。和 Java 在同一個(gè)類內(nèi)部共享數(shù)據(jù)的方案一樣。 Attributes can be accessible to a single rule (using @init to declare them), a rule and all rules invoked by it (rule scope), or by all rules that request the named global scope of the attributes.

            Attribute scopes define collections of attributes that can be accessed by multiple rules. There are two kinds, global and rule scopes.

            屬性域定義其它規(guī)則可以訪問的各種屬性。有兩種全局和規(guī)則級(jí)的屬性域。

            Global scopes are named scopes that are defined outside any rule. To request access to a global scope within a rule, add scope name; to the rule. To access multiple global scopes, list their names separated by spaces. The following syntax is used to define a global scope.

            全局屬性域定義的屬性在任何規(guī)則(函數(shù))之外。在規(guī)則(函數(shù))內(nèi)訪問全局屬性,,需要給規(guī)則添加 scope name; 。 要訪問多個(gè)全局屬性域, 列出所有用空格分隔的名字。下面的語法用來定義一個(gè)全局屬性域。

            scope name {
              type variable;
              
            }

            Rule scopes are unnamed scopes that are defined inside a rule. Rule actions in the defining rule and rules invoked by it access attributes in the scope with $rule-name::variable . The following syntax is used to define a rule scope.

            規(guī)則域是未命名的域,在規(guī)則內(nèi)部定義。 當(dāng)前定義的規(guī)則的 action 中和當(dāng)前規(guī)則應(yīng)用的規(guī)則,使用如下語法訪問屬性 $rule-name::variable 。 下面的語法用來定義一個(gè)規(guī)則域。

            scope {
              type variable;
             

            }

             

            To initialize an attribute, use an @init rule action.

            要初始化一個(gè)屬性,使用規(guī)則 action @init

            Our Tree Grammar

            我們的樹文法定義

            tree grammar MathTree;
            options {
              
            // We're going to process an AST whose nodes are of type CommonTree.
              ASTLabelType = CommonTree;
              
            // We're going to use the tokens defined in
              
            // both our MathLexer and MathParser grammars.
              
            // The MathParser grammar already includes
              
            // the tokens defined in the MathLexer grammar.
              tokenVocab = MathParser;
            }
            @header {
              
            // We want the generated parser class to be in this package.
              package com.ociweb.math;
              
            import java.util.Map;
              
            import java.util.TreeMap;
            }
            // We want to add some fields and methods to the generated class.
            @members {
              
            // We're using TreeMaps so the entries are sorted on their keys
              
            // which is desired when listing them.
              private Map<String, Function> functionMap = new TreeMap<String, Function>();
              
            private Map<String, Double> variableMap = new TreeMap<String, Double>();
              
            // This adds a Function to our function Map.
              private void define(Function function) {
                functionMap.put(function.getName(), function);
              }
              
            // This retrieves a Function from our function Map
              
            // whose name matches the text of a given AST tree node.
              private Function getFunction(CommonTree nameNode) {
                String name 
            = nameNode.getText();
                Function function 
            = functionMap.get(name);
                
            if (function == null) {
                  String msg 
            = "The function \"" + name + "\" is not defined.";
                  
            throw new RuntimeException(msg);
                }
                
            return function;
              }
              
            // This evaluates a function whose name matches the text
              
            // of a given AST tree node for a given value.
              private double evalFunction(CommonTree nameNode, double value) {
                
            return getFunction(nameNode).getValue(value);
              }
              
            // This retrieves the value of a variable from our variable Map
              
            // whose name matches the text of a given AST tree node.
              private double getVariable(CommonTree nameNode) {
                String name 
            = nameNode.getText();
                Double value 
            = variableMap.get(name);
                
            if (value == null) {
                  String msg 
            = "The variable \"" + name + "\" is not set.";
                  
            throw new RuntimeException(msg);
                }
                
            return value;
              }
              
            // This just shortens the code for print calls.
              private static void out(Object obj) {
                System.out.print(obj);
              }
              
            // This just shortens the code for println calls.
              private static void outln(Object obj) {
                System.out.println(obj);
              }
              
            // This converts the text of a given AST node to a double.
              private double toDouble(CommonTree node) {
                
            double value = 0.0;
                String text 
            = node.getText();
                
            try {
                  value 
            = Double.parseDouble(text);
                } 
            catch (NumberFormatException e) {
                  
            throw new RuntimeException("Cannot convert \"" + text + "\" to a double.");
                }
                
            return value;
              }
              
            // This replaces all escaped newline characters in a String
              
            // with unescaped newline characters.
              
            // It is used to allow newline characters to be placed in
              
            // literal Strings that are passed to the print command.
              private static String unescape(String text) {
                
            return text.replaceAll("\\\\n""\n");
              }
            // @members
            script: statement*;
            statement: assign 
            | combine | define | interactiveStatement | print;
            // These kinds of statements only need to be supported
            // when reading input from the keyboard.
            interactiveStatement: help | list;
            // This adds a variable to the map.
            // Parts of rule alternatives can be assigned to variables (ex. v)
            // that are used to refer to them in rule actions.
            // Alternatively rule names (ex. NAME) can be used.
            // We could have used $value in place of $v below.
            assign: ^(ASSIGN NAME v=value) { variableMap.put($NAME.text, $v.result); };
            // This returns a value as a double.
            // The value can be a number, a variable name or a function evaluation.
            value returns [double result]
              : NUMBER { $result 
            = toDouble($NUMBER); }
              
            | NAME { $result = getVariable($NAME); }
              
            | functionEval { $result = $functionEval.result; }
              ;
            // This returns the result of a function evaluation as a double.
            functionEval returns [double result]
              : 
            ^(FUNCTION fn=NAME v=NUMBER) {
                  $result 
            = evalFunction($fn, toDouble($v));
                }
              
            | ^(FUNCTION fn=NAME v=NAME) {
                  $result 
            = evalFunction($fn, getVariable($v));
                }
              ;
            // This builds a Function object and adds it to the function map.
            define
              : 
            ^(DEFINE name=NAME variable=NAME polynomial) {
                define(
            new Function($name.text, $variable.text, $polynomial.result));
              }
              ;
            // This builds a Polynomial object and returns it.
            polynomial returns [Polynomial result]
            // The "current" attribute in this rule scope is visible to
            // rules invoked by this one, such as term.
            scope { Polynomial current; }
            @init { $polynomial::current 
            = new Polynomial(); }
              
            // There can be no sign in front of the first term,
              
            // so "" is passed to the term rule.
              
            // The coefficient of the first term can be negative.
              
            // The sign between terms is passed to
              
            // subsequent invocations of the term rule.
              : ^(POLYNOMIAL term[""] (s=SIGN term[$s.text])*) {
                $result 
            = $polynomial::current;
              }
              ;
            // This builds a Term object and adds it to the current Polynomial.
            term[String sign]
            @init { 
            boolean negate = "-".equals(sign); }
              : 
            ^(TERM coefficient=NUMBER) {
                
            double c = toDouble($coefficient);
                
            if (negate) c = -c; // applies sign to coefficient
                $polynomial::current.addTerm(new Term(c));
              }
              
            | ^(TERM coefficient=NUMBER? variable=NAME exponent=NUMBER?) {
                  
            double c = coefficient == null ? 1.0 : toDouble($coefficient);
                  
            if (negate) c = -c; // applies sign to coefficient
                  double exp = exponent == null ? 1.0 : toDouble($exponent);
                  $polynomial::current.addTerm(
            new Term(c, $variable.text, exp));
              }
              ;
            // This outputs help on our language which is useful in interactive mode.
            help
              : HELP {
                 outln(
            "In the help below");
                 outln(
            "* fn stands for function name");
                 outln(
            "* n stands for a number");
                 outln(
            "* v stands for variable");
                 outln(
            "");
                 outln(
            "To define");
                 outln(
            "* a variable: v = n");
                 outln(
            "* a function from a polynomial: fn(v) = polynomial-terms");
                 outln(
            "  (for example, f(x) = 3x^2 - 4x + 1)");
                 outln(
            "* a function from adding or subtracting two others: " +
                 
            "fn3 = fn1 +|- fn2");
                 outln(
            "  (for example, h = f + g)");
                 outln(
            "");
                 outln(
            "To print");
                 outln(
            "* a literal string: print \"text\"");
                 outln(
            "* a number: print n");
                 outln(
            "* the evaluation of a function: print fn(n | v)");
                 outln(
            "* the defintion of a function: print fn()");
                 outln(
            "* the derivative of a function: print fn'()");
                 outln(
            "* multiple items on the same line: print i1 i2  in");
                 outln(
            "");
                 outln(
            "To list");
                 outln(
            "* variables defined: list variables");
                 outln(
            "* functions defined: list functions");
                 outln(
            "");
                 outln(
            "To get help: help or ?");
                 outln(
            "");
                 outln(
            "To exit: exit or quit");
              }
              ;
            // This lists all the functions or variables that are currently defined.
            list
              : 
            ^(LIST FUNCTIONS) {
                 outln(
            "# of functions defined: " + functionMap.size());
                 
            for (Function function : functionMap.values()) {
                   outln(function);
                 }
              }
            | ^(LIST VARIABLES) {
                outln(
            "# of variables defined: " + variableMap.size());
                
            for (String name : variableMap.keySet()) {
                  
            double value = variableMap.get(name);
                  outln(name 
            + " = " + value);
                }
              }
              ;
            // This adds or substracts two functions to create a new one.
            combine
              : 
            ^(COMBINE fn1=NAME op=SIGN fn2=NAME fn3=NAME) {
                  Function f2 
            = getFunction(fn2);
                  Function f3 
            = getFunction(fn3);
                  
            if ("+".equals($op.text)) {
                    
            // "$fn1.text" is the name of the new function to create.
                    define(f2.add($fn1.text, f3));
                  } 
            else if ("-".equals($op.text)) {
                    define(f2.subtract($fn1.text, f3));
                  } 
            else {
                    
            // This should never happen since SIGN is defined to be either "+" or "-".
                    throw new RuntimeException(
                    
            "The operator \"" + $op +
                    " cannot be used for combining functions.");
                  }
               }
              ;
            // This prints a list of printTargets then prints a newline.
            print
              : 
            ^(PRINT printTarget*)
              { System.out.println(); };
            // This prints a single printTarget without a newline.
            // "out", "unescape", "getVariable", "getFunction", "evalFunction"
            // and "toDouble" are methods we wrote that were defined
            // in the @members block earlier.
            printTarget
              : NUMBER { out($NUMBER); }
              
            | STRING_LITERAL {
                  String s 
            = unescape($STRING_LITERAL.text);
                  out(s.substring(
            1, s.length() - 1)); // removes quotes
                }
              
            | NAME { out(getVariable($NAME)); }
              
            | ^(FUNCTION NAME) { out(getFunction($NAME)); }
              
            // The next line uses the return value named "result"
              
            // from the earlier rule named "functionEval".
              | functionEval { out($functionEval.result); }
              
            | derivative
              ;
            // This prints the derivative of a function.
            // This also could have been done in place in the printTarget rule.
            derivative
              : 
            ^(DERIVATIVE NAME) {
                out(getFunction($NAME).getDerivative());
              }
              ;

            Part VI - ANTLRWorks

            ANTLRWorks is a graphical grammar editor and debugger. It checks for grammar errors, including those beyond the syntax variety such as conflicting rule alternatives, and highlights them. It can display a syntax diagram for a selected rule. It provides a debugger that can step through creation of parse trees and ASTs.

            ANTLRWorks 是一個(gè)圖形化的文法編輯器和調(diào)試器。它可以檢查文法的錯(cuò)誤,甚至不是的語法上的而是邏輯錯(cuò)誤,比如互相沖突的規(guī)則分支,然后高亮它們。 它可以給選中的規(guī)則顯示一個(gè)語法圖。

            Rectangles in syntax diagrams correspond to fixed vocabulary symbols. Rounded rectangles correspond to variable symbols.

            語法圖中的方框?qū)?yīng)于確定的字符。圓角矩形對(duì)應(yīng)于不定的字符。

            Here's an example of a syntax diagram for a selected lexer rule.

            下面是一個(gè)選中規(guī)則的語法圖示例。

            Lexer rule syntax diagram

            Here's an example of a syntax diagram for a selected parser rule.

            Parser rule syntax diagram

            Here's an example of requesting a grammar check, followed by a successful result.

            下面是檢查文法的例子,文法是正確的。

            ANTLRWorks check grammar request

            ANTLRWorks check grammar result

            Using the ANTLRWorks debugger is simple when the lexer and parser rules are combined in a single grammar file, unlike our example. Press the Debug toolbar button (with a bug on it), enter input text or select an input file, select the start rule (allows debugging a subset of the grammar) and press the OK button. Here's an example of entering the input for a different, simpler grammar that defines the lexer and parser rules in a single file:

            當(dāng) lexer 和 parser 的規(guī)則合并癥一個(gè)文法文件中時(shí),使用 ANTLRWorks 調(diào)試器會(huì)非常簡單。 不像我們的例子, 按下工具欄的調(diào)試按鈕(上面有一個(gè)蟲子的 ), 鍵入輸入文本或選擇一個(gè)文件, 選中開始規(guī)則 (也可以只是調(diào)試一些規(guī)則)然后點(diǎn) OK 按鈕。 下面的例子展示了給一個(gè)簡單點(diǎn)的文法輸入文本,其 lexer 和 parser 規(guī)則都在一個(gè)文件中:

            ANTLRWorks debugger input

            The debugger controls and output are displayed at the bottom of the ANTLRWorks window. Here's an example using that same, simpler grammar:

            調(diào)試器的控件和輸出都在 ANTLRWorks 窗口的底部, 下面是使用上一個(gè)簡單文法的例子:

            ANTLRWorks debugger output

            Using the debugger when the lexer and parser rules are in separate files, like in our example, is a bit more complicated. See the ANTLR Wiki page titled "When do I need to use remote debugging."

            像我們的例子這樣, lexer 和 parser 規(guī)則在不同的文件中,使用調(diào)試器會(huì)稍微復(fù)雜一點(diǎn)。 請(qǐng)看 ANTLR Wiki 頁,標(biāo)題 "When do I need to use remote debugging"。


            Part VII - Putting It All Together

            Part VII - 組裝所有部件!

            終于翻到這里了。。

            Using Generated Classes

            使用生成的類

            Next we need to write a class to utilize the classes generated by ANTLR. We'll call ours Processor. This class will use MathLexer (extends Lexer), MathParser (extends Parser) and MathTree (extends TreeParser). Note that the clases Lexer, Parser and TreeParser all extend the class BaseRecognizer. Our Processor class will also use other classes we wrote to model our domain. These classes are named Term, Function and Polynomial. We'll support two modes of operation, batch and interactive.

            下面我們將要寫一個(gè)類來調(diào)用 ANTLR 自動(dòng)生成的類。 我們叫它 Processor 。 這個(gè)類會(huì)使用 MathLexer (繼承自 Lexer), MathParser (繼承自 Parser) 和 MathTree (繼承自 TreeParser)。 注意,所有 Lexer , Parser 和 TreeParser 都繼承自類 BaseRecognizer 。 我們的 Processor 類也會(huì)使用一些其它的類,那些用來組成我們整個(gè)架構(gòu)的類。 這些類是 Term , Function 和 Polynomial 。 我們將會(huì)支持兩種模式的操作,批處理模式和交互式。

            Here's our Processor class.

            下面是我們的 Processor 類。

            package com.ociweb.math;
            import java.io.*;
            import java.util.Scanner;
            import org.antlr.runtime.*;
            import org.antlr.runtime.tree.*;
            public class Processor {
              public static void main(String[] args) throws IOException, RecognitionException {
                if (args.length == 0) {
                  new Processor().processInteractive();
                } 
            else if (args.length == 1) { // name of file to process was passed in
                  new Processor().processFile(args[0]);
                } 
            else { // more than one command-line argument
                  System.err.println("usage: java com.ociweb.math.Processor [file-name]");
                }
              }
              private void processFile(String filePath) throws IOException, RecognitionException {
                CommonTree ast 
            = getAST(new FileReader(filePath));
                //System.out.println(ast.toStringTree()); // for debugging
                processAST(ast);
              }
              private CommonTree getAST(Reader reader) throws IOException, RecognitionException {
                MathParser tokenParser 
            = new MathParser(getTokenStream(reader));
                MathParser.script_return parserResult 
            =
                tokenParser.script(); 
            // start rule method
                reader.close();
                return (CommonTree) parserResult.getTree();
              }
              private CommonTokenStream getTokenStream(Reader reader) throws IOException {
                MathLexer lexer 
            = new MathLexer(new ANTLRReaderStream(reader));
                return new CommonTokenStream(lexer);
              }
              private void processAST(CommonTree ast) throws RecognitionException {
                MathTree treeParser 
            = new MathTree(new CommonTreeNodeStream(ast));
                treeParser.script(); 
            // start rule method
              }
              private void processInteractive() throws IOException, RecognitionException {
                MathTree treeParser 
            = new MathTree(null); // a TreeNodeStream will be assigned later
                Scanner scanner = new Scanner(System.in);
                while (true) {
                  System.out.print(
            "math> ");
                  String line 
            = scanner.nextLine().trim();
                  if ("quit".equals(line) || "exit".equals(line)) break;
                  processLine(treeParser, line);
                }
              }
              // Note that we can't create a new instance of MathTree for each
              // line processed because it maintains the variable and function Maps.
              private void processLine(MathTree treeParser, String line) throws RecognitionException {
              // Run the lexer and token parser on the line.
                MathLexer lexer = new MathLexer(new ANTLRStringStream(line));
                MathParser tokenParser 
            = new MathParser(new CommonTokenStream(lexer));
                MathParser.statement_return parserResult 
            =
                tokenParser.statement(); 
            // start rule method
                // Use the token parser to retrieve the AST.
                CommonTree ast = (CommonTree) parserResult.getTree();
                if (ast == nullreturn// line is empty
                // Use the tree parser to process the AST.
                treeParser.setTreeNodeStream(new CommonTreeNodeStream(ast));
                treeParser.statement(); 
            // start rule method
              }
            // end of Processor class

            Ant Tips

            Ant 提示

            Ant is a great tool for automating tasks used to develop and test grammars. Suggested independent "targets" include the following.

            Ant 工具可以很好的幫助開發(fā)和調(diào)試文法的自動(dòng)化任務(wù)。 支持下面每一個(gè)“目標(biāo)”。

            • Use org.antlr.Tool to generate Java classes and ".tokens" files from each grammar file.
              • ".tokens" files assign integer constants to token names and are used by org.antlr.Tool when processing subsequent grammar files.
              • The "uptodate" task can be used to determine whether the grammar has changed since the last build.
              • The "unless" target attribute can be used to avoid running org.antlr.Tool if the grammar hasn't changed since the last build.
            • 使用 org.antlr.Tool 來從每一個(gè)文法文件中生成 Java 類和 ".tokens" 文件。
              • ".tokens" 給 token 綁定一個(gè)常量整數(shù), 且 org.antlr.Tool 處理后續(xù)的文法文件時(shí)會(huì)用到它。
              • "uptodate" 任務(wù)可以用來確定文法在上次構(gòu)建后是否有所改變。
              • "unless" 任務(wù)屬性可以用來避免在上次構(gòu)建后文法沒有改變而 org.antlr.Tool 依然運(yùn)行。
            • Compile Java source files.
            • 編譯 Java 源文件。
            • Run automated tests.
            • 執(zhí)行自動(dòng)測試。
            • Run the application using a specific file as input.
            • 使用特定文件作為輸入來運(yùn)行程序。
            • Delete all generated files (clean target).
            • 刪除所有自動(dòng)生成的文件 (清理目標(biāo))。

            For examples of all of these, download the source code from the URL listed at the end of this article and see the build.xml file.

            比如,對(duì)于所有這些任務(wù),可以從文章末尾列出的URL下載源文件然后查看 build.xml 文件。


            Part VIII - Wrap Up

            Hidden Tokens

            By default the parser only processes tokens from the default channel. It can however request tokens from other channels such as the hidden channel. Tokens are assigned unique, sequential indexes regardless of the channel to which they are written. This allows parser code to determine the order in which the tokens were encountered, regardless of the channel to which they were written.

            解析器默認(rèn)只處理默認(rèn)通道里的 token 。 它也可以從其它通道比如隱藏通道,里請(qǐng)求 token 。 Token 會(huì)被按連續(xù)順序,不管它將要被寫入哪個(gè)通道,賦給一個(gè)唯一的索引值。 這樣可以允許解析器確定 token 被發(fā)現(xiàn)的順序(不管他們要被寫入哪個(gè)通道)。

            Here are some related public constants and methods from the Token class.

            下面是一些 Token 類中相關(guān)的公有常量和函數(shù)。

            • static final int DEFAULT_CHANNEL
            • static final int HIDDEN_CHANNEL
            • int getChannel() This gets the number of the channel where this Token was written. 取得此 Token 要被寫入的通道數(shù)
            • int getTokenIndex() This gets the index of this Token. 取得當(dāng)前 Token 的索引

            Here are some related public methods from the CommonTokenStream class, which implements the TokenStream interface.

            下面是一些 CommonTokenStream 類相關(guān)的公有函數(shù),它實(shí)現(xiàn)了 TokenStream 接口。

            • Token get(int index) This gets the Token found at a given position in the input. 取得輸入中 index 位置的 Token
            • List getTokens(int start, int stop) This gets a List of Tokens found between given positions in the input. 取得輸入中指定一個(gè)區(qū)間的一串 Token
            • int index() This gets the index of the last Token that was read. 取得最近一次讀取的 Token 的index

            Advanced Topics

            We have demonstrated the basics of using ANTLR. For information on advanced topics, see the slides from the presentation on which this article was based at http://www.ociweb.com/mark/programming/ANTLR3.html. This web page contains links to the slides and the code presented in this article. The advanced topics covered in these slides include the following.

            我們已經(jīng)展示了 ANTLR 基本使用方法。 一些高階內(nèi)容方面的信息,可以看本文所基于內(nèi)容的幻燈片,地址: http://www.ociweb.com/mark/programming/ANTLR3.html 。 這個(gè)頁面包含了幻燈片的鏈接和本文展示的代碼。 這些幻燈片涵蓋了下面幾個(gè)高階部分

            • remote debugging
            • using the StringTemplate library
            • details on the use of lookahead in grammars
            • three kinds semantic predicates: validating, gated and disambiguating
            • syntactic predicates
            • customizing error handling
            • gUnit grammar unit testing framework
            • 遠(yuǎn)程調(diào)試
            • 使用 StringTemplate 庫
            • 在文法中使用前看的詳細(xì)信息
            • 三種語義斷言: validating, gated 和 disambiguating
            • 語法斷言
            • 自定義錯(cuò)誤處理
            • gUnit grammar 單元測試框架

            Projects Using ANTLR

            Many programming languages have been implemented using ANTLR. These include Boo, Groovy, Mantra, Nemerle and XRuby.

            很都編程語言都被用 ANTLR 重新實(shí)現(xiàn)了。 包括 Boo, Groovy, Mantra, Nemerle 和 XRuby 。

            Many other kinds of tools use ANTLR in their implementation. These include Hibernate (for its HQL to SQL query translator), Intellij IDEA, Jazillian (translates COBOL, C and C++ to Java), JBoss Rules (was Drools), Keynote (from Apple), WebLogic (from Oracle), and many more.

            很多其它類的工具也使用了 ANTLR 。 包括 Hibernate ( HQL 到 SQL 查詢轉(zhuǎn)換器部分), Intellij IDEA, Jazillian (把 COBOL, C 和 C++ 翻譯成 Java), JBoss Rules (曾名 Drools), Keynote (源自 Apple), WebLogic (源自 Oracle), 等等等等。

            Books

            Currrently only one book on ANTLR is available. Terence Parr, creator of ANTLR, wrote "The Definitive ANTLR Reference" It is published by "The Pragmatic Programmers." Terence is working on a second book for the same publisher that may be titled "ANTLR Recipes."

            當(dāng)前, ANTLR 書只有一本。(不知道現(xiàn)在咋樣了。。 ) Terence Parr, ANTLR 作者,編寫的 "The Definitive ANTLR Reference" 《ANTLR 權(quán)威參考手冊(cè)?》由 "The Pragmatic Programmers" 出版。 Terence 正在致力于幫該出版社寫另一本可能會(huì)命名為 "ANTLR 秘訣"。

            Summary

            There you have it! ANTLR is a great tool for generating custom language parsers. We hope this article will make it easier to get started creating validators, processors and translators.

            你已成佛! ANTLR 是一個(gè)非常好的生成特制語言解析器的工具。 我們希望本文讓開始創(chuàng)建 驗(yàn)證器,處理器和翻譯器 的過程變簡單些了。

            References

            posted on 2009-12-07 00:13 Morya 閱讀(3718) 評(píng)論(0)  編輯 收藏 引用

            導(dǎo)航

            <2025年5月>
            27282930123
            45678910
            11121314151617
            18192021222324
            25262728293031
            1234567

            統(tǒng)計(jì)

            常用鏈接

            留言簿(1)

            隨筆檔案(21)

            文章檔案(1)

            最新評(píng)論

            評(píng)論排行榜

            99久久www免费人成精品 | 97久久精品人人澡人人爽| 久久久久久免费一区二区三区| 精品999久久久久久中文字幕| 品成人欧美大片久久国产欧美... 品成人欧美大片久久国产欧美 | 国内精品久久久久影院老司| 久久精品国产乱子伦| 99久久精品影院老鸭窝| 国产伊人久久| 色欲av伊人久久大香线蕉影院| 国产精品视频久久久| 久久久久噜噜噜亚洲熟女综合| 久久99精品国产麻豆宅宅| 99久久婷婷免费国产综合精品| 久久国产乱子伦精品免费午夜| 狠狠色丁香婷婷久久综合| 韩国无遮挡三级久久| 久久亚洲国产成人影院网站 | 2020最新久久久视精品爱| 久久久高清免费视频| 久久国产精品国产自线拍免费| 亚洲成av人片不卡无码久久 | www久久久天天com| 久久亚洲精品国产亚洲老地址| 国产日产久久高清欧美一区| 久久久受www免费人成| 国产精品久久永久免费| 久久久久亚洲AV无码观看| 精品水蜜桃久久久久久久| 欧美va久久久噜噜噜久久| 亚洲午夜精品久久久久久app| 久久99国产精品二区不卡| 婷婷久久久亚洲欧洲日产国码AV | 国产精品无码久久综合| 中文成人久久久久影院免费观看| 久久综合综合久久狠狠狠97色88| 亚洲国产欧美国产综合久久| 性做久久久久久免费观看| 国产精品va久久久久久久| 国产精品久久久久9999| 欧美牲交A欧牲交aⅴ久久|