mirror of
https://codeberg.org/NLnet/takentaal.git
synced 2025-08-29 05:58:57 +00:00
Small text cleanups
This commit is contained in:
parent
3bf56e0345
commit
941aef502b
1 changed files with 9 additions and 7 deletions
16
takentaal.g4
16
takentaal.g4
|
@ -1,13 +1,14 @@
|
|||
/**
|
||||
* This file defines the grammar for takentaal.
|
||||
* It is divided into parser rules (lowercase) and lexer rules (uppercase).
|
||||
* It is divided into parser rules (lowercase names) and lexer rules (uppercase
|
||||
* names).
|
||||
* The parser splits an input into tokens accoring to the lexer rules.
|
||||
* At any point, all lexer rules are considered. If multiple rules match,
|
||||
* a lexer rule is chosen as follows:
|
||||
* At any point in the input text, all lexer rules are considered. If multiple
|
||||
* rules match, a lexer rule is chosen as follows:
|
||||
* - the rule that matches the longest input is chosen
|
||||
* - any implicit rule, e.g. 'a', is chosen
|
||||
* - the first defined rule is chosen.
|
||||
* Since this grammar has to match unquoted texts and text are usually longer
|
||||
* Since this grammar has to match unquoted texts and texts are usually longer
|
||||
* than other token matches, texts are split into characters so that they have
|
||||
* a lower ranking.
|
||||
*/
|
||||
|
@ -69,8 +70,8 @@ a1_0_subtask_token
|
|||
| SUBTASK_OBSOLETE_TOKEN
|
||||
;
|
||||
|
||||
// Any implicit and explity lexer token that may appear in a text should be listed
|
||||
// in this definition.
|
||||
// Any implicit and explity lexer token that may appear in a text should be
|
||||
// listed in this definition.
|
||||
text
|
||||
: (INT | '{' | '}' | S | CHAR | WORD)+
|
||||
;
|
||||
|
@ -107,5 +108,6 @@ fragment DIGIT : [0-9] ;
|
|||
// Match printable characters, except space which is covered by S
|
||||
CHAR : [!-~\u00A0-\u33FF] ; // ASCII and UNICODE
|
||||
|
||||
// This is a performance improvement that groups chars that do not have a special meaning
|
||||
// This is a performance improvement that groups chars that do not have a
|
||||
// special meaning
|
||||
WORD : [A-Za-z\u00A0-\u33FF]+ ;
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue