diff --git a/takentaal.g4 b/takentaal.g4 index d283966..4943d23 100644 --- a/takentaal.g4 +++ b/takentaal.g4 @@ -1,13 +1,14 @@ /** * This file defines the grammar for takentaal. - * It is divided into parser rules (lowercase) and lexer rules (uppercase). + * It is divided into parser rules (lowercase names) and lexer rules (uppercase + * names). * The parser splits an input into tokens accoring to the lexer rules. - * At any point, all lexer rules are considered. If multiple rules match, - * a lexer rule is chosen as follows: + * At any point in the input text, all lexer rules are considered. If multiple + * rules match, a lexer rule is chosen as follows: * - the rule that matches the longest input is chosen * - any implicit rule, e.g. 'a', is chosen * - the first defined rule is chosen. - * Since this grammar has to match unquoted texts and text are usually longer + * Since this grammar has to match unquoted texts and texts are usually longer * than other token matches, texts are split into characters so that they have * a lower ranking. */ @@ -69,8 +70,8 @@ a1_0_subtask_token | SUBTASK_OBSOLETE_TOKEN ; -// Any implicit and explity lexer token that may appear in a text should be listed -// in this definition. +// Any implicit and explity lexer token that may appear in a text should be +// listed in this definition. text : (INT | '{' | '}' | S | CHAR | WORD)+ ; @@ -107,5 +108,6 @@ fragment DIGIT : [0-9] ; // Match printable characters, except space which is covered by S CHAR : [!-~\u00A0-\u33FF] ; // ASCII and UNICODE -// This is a performance improvement that groups chars that do not have a special meaning +// This is a performance improvement that groups chars that do not have a +// special meaning WORD : [A-Za-z\u00A0-\u33FF]+ ;