Regex Issue

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Regex Issue

Alex Jeannopoulos
Here is a snippet of a grammar I am trying to setup. I am looking to match the following text

PROTO CLIENT EVENT ActiveConversationsList       32 PAYLOAD[JSON(conversations):data] PAYLOAD_DESCRIPTION ""
EVENT_DESCRIPTION "List of ActiveConversations, each ActiveConversation contains conv_id, conv_name, group_id, start_indx, last_indx, uids_involved"
CATEGORY "" CODE (java)
{{{

java example
aaa
aaa
aa

}}}

Here is portion of the grammar I have come up with. 
%tokens%
IDENTIFIER_STRING   = <<[a-zA-Z][a-zA-Z0-9-_]*>>
PAREN_OPEN              = "("
PAREN_CLOSE            = ")"
CODE                            = "CODE"
CODE_TEXT                 = <<(^\{\{\{\n).*(^\}\}\}\n)>>

%productions%
CODE_LANG                = "(" IDENTIFIER_STRING ")";
CODE_DEFINITION     = CODE (CODE_LANG)? CODE_TEXT;

EventDefinition   = SERVER_TYPE
                    CODE_TYPE
                    EVENT_TYPE
                    IDENTIFIER_STRING DEC_NUMBER
                    VERSION_TAG?
                    PAYLOAD
                    PAYLOAD_DESCRIPTION_INFO
                    EVENT_DESCRIPTION_INFO
                    CATEGORY_INFO
                    ACTIVE_STATE?
                    CODE_DEFINITION?
                    ;

[ValidateProtocol] {{{
[ValidateProtocol] [1328:1] unexpected token "{{{" <CODE_BEGIN>, expected <CODE_TEXT>, on line 1328 column: 1
[ValidateProtocol] Error parsing protocol file

As long as I have the CODE_DEFINITION tag in the definition I have errors. What am I missing, 

thanks in advance

_______________________________________________
Grammatica-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/grammatica-users
Reply | Threaded
Open this post in threaded view
|

Re: Regex Issue

Per Cederberg
The error message tries its best to explain what happens:

[1328:1] unexpected token "{{{" <CODE_BEGIN>, expected <CODE_TEXT>

That is, the grammar contains a token named "CODE_BEGIN" (not shown in your example) that is matching instead of CODE_TEXT (which was expected). Try removing that token (if not needed elsewhere).

Cheers,

/Per

On Fri, Apr 24, 2015 at 4:56 AM, Alex Jeannopoulos <[hidden email]> wrote:
Here is a snippet of a grammar I am trying to setup. I am looking to match the following text

PROTO CLIENT EVENT ActiveConversationsList       32 PAYLOAD[JSON(conversations):data] PAYLOAD_DESCRIPTION ""
EVENT_DESCRIPTION "List of ActiveConversations, each ActiveConversation contains conv_id, conv_name, group_id, start_indx, last_indx, uids_involved"
CATEGORY "" CODE (java)
{{{

java example
aaa
aaa
aa

}}}

Here is portion of the grammar I have come up with. 
%tokens%
IDENTIFIER_STRING   = <<[a-zA-Z][a-zA-Z0-9-_]*>>
PAREN_OPEN              = "("
PAREN_CLOSE            = ")"
CODE                            = "CODE"
CODE_TEXT                 = <<(^\{\{\{\n).*(^\}\}\}\n)>>

%productions%
CODE_LANG                = "(" IDENTIFIER_STRING ")";
CODE_DEFINITION     = CODE (CODE_LANG)? CODE_TEXT;

EventDefinition   = SERVER_TYPE
                    CODE_TYPE
                    EVENT_TYPE
                    IDENTIFIER_STRING DEC_NUMBER
                    VERSION_TAG?
                    PAYLOAD
                    PAYLOAD_DESCRIPTION_INFO
                    EVENT_DESCRIPTION_INFO
                    CATEGORY_INFO
                    ACTIVE_STATE?
                    CODE_DEFINITION?
                    ;

[ValidateProtocol] {{{
[ValidateProtocol] [1328:1] unexpected token "{{{" <CODE_BEGIN>, expected <CODE_TEXT>, on line 1328 column: 1
[ValidateProtocol] Error parsing protocol file

As long as I have the CODE_DEFINITION tag in the definition I have errors. What am I missing, 

thanks in advance

_______________________________________________
Grammatica-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/grammatica-users



_______________________________________________
Grammatica-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/grammatica-users
Reply | Threaded
Open this post in threaded view
|

Re: Regex Issue

Oliver Gramberg-2
In reply to this post by Alex Jeannopoulos
Hi Alex,
 
it seems to me that there is a token CODE_BEGIN="{{{" or equivalent somewhere in your grammar BEFORE the "CODE_TEXT" token that you are not showing here. If I remember correctly, Grammatica has a "first defined, first matched" rule, or maybe was it "keywords before regex matches"? What happens if you remove the CODE_BEGIN token? The triple braces are part of CODE_TEXT, anyway.
 
Regards
Oliver

_______________________________________________
Grammatica-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/grammatica-users
Reply | Threaded
Open this post in threaded view
|

Re: Regex Issue

Alex Jeannopoulos
The full grammar file is listed at the bottom.  I am trying to define a multi line string, so I can have code blocks (multi line strings) defined in my data file. 

Here is how I defined a multiline string.

MULTILINE_STRING                        = <<'''.*'''>>

I keep getting an error. I would like to force   ( 3 single quotes) pinned to the start of a line followed by "\n"
[5:1] unexpected character ''', on line 5 column: 1

The protocol file that is read in is: 

PROTO CLIENT EVENT ActiveConversationsList       32 PAYLOAD[JSON(conversations):data] PAYLOAD_DESCRIPTION ""
EVENT_DESCRIPTION "List of ActiveConversations, each ActiveConversation contains conv_id, conv_name, group_id, start_indx, last_indx, uids_involved"
CATEGORY ""
CODE(java)
'''
Test java region

'''

Any help or pointers to cleanup my grammar are welcome. Thanks


The Grammar:
%header%

GRAMMARTYPE = "LL"

DESCRIPTION = "A test protocol grammar"

AUTHOR      = ""
VERSION     = "0.1"
DATE        = ""

COPYRIGHT   = "Copyright (c) 2015. All rights reserved."


%tokens%

PROTO                                   = "PROTO"


CLIENT                                  = "CLIENT"
SERVER                                  = "SERVER"

EVENT                                   = "EVENT"
WEVENT                                  = "WEVENT"

BYTE                                    = "BYTE"
CHAR                                    = "CHAR"
SHORT                                   = "SHORT"
INT                                     = "INT"
INT_LIST                                = "INT_LIST"
INT_LIST_INTLEN                         = "INT_LIST_INTLEN"
LONG                                    = "LONG"
LONG_LIST                               = "LONG_LIST"
IP_PORT_BYTES                           = "IP_PORT_BYTES"
IP_ADDRESS                              = "IP_ADDRESS"
BYTES                                   = "BYTES"
JSON                                    = "JSON"
STRING                                  = "STRING"
CODE                                    = "CODE"
STRINGLIST_NLDELIM                      = "STRINGLIST_NLDELIM"


CATEGORY                                = "CATEGORY"
EVENT_DESCRIPTION                       = "EVENT_DESCRIPTION"
PAYLOAD_DESCRIPTION                     = "PAYLOAD_DESCRIPTION"
VERSION                                 = "VERSION"
ID                                      = "ID"
optional                                = "optional"

COLON                                   = ":"
COMMA                                   = ","
PAYLOAD_OPEN                            = "PAYLOAD["
PAYLOAD_CLOSE                           = "]"
ACTIVE                                  = "ACTIVE"
TRUE                                    = <<TRUE|true>>
FALSE                                   = <<FALSE|false>>

PAREN_OPEN                              = "("
PAREN_CLOSE                             = ")"

SEPARATOR                               = "|"

SINGLE_LINE_COMMENT                     = <<//.*>> %ignore%
MULTI_LINE_COMMENT                      = <</\*([^*]|\*+[^*/])*\*+/>> %ignore%
IDENTIFIER_STRING                       = <<[a-zA-Z][a-zA-Z0-9-_]*>>
QUOTED_STRING                           = <<"([^"]|"")*+">>
MULTILINE_STRING                        = <<'''.*'''>>
WHITESPACE                              = <<[ \t\n\r]+>> %ignore%
COMMENT                                 = <<--([^\n\r-]|-[^\n\r-])*(--|-?[\n\r])>> %ignore%
DEC_NUMBER                      = <<[+-]?[1-9][0-9]*>>
STAR                                    = <<\*>>




/** Production definitions **/
%productions%
Start = EventDefinition +;

SERVER_TYPE       = PROTO;


OPTIONS_KEYS      = PAREN_OPEN  (IDENTIFIER_STRING [optional] ((COMMA|SEPARATOR) IDENTIFIER_STRING [optional] )* )?  PAREN_CLOSE ;



CODE_TYPE         = CLIENT | SERVER;

EVENT_TYPE        = WEVENT | EVENT;

ACTIVE_STATE      = ACTIVE ( TRUE | FALSE);

VERSION_TAG       = VERSION DEC_NUMBER;

PAYLOAD_ITEM_TYPE = BYTE |
                    CHAR |
                    SHORT |
                    INT |
                    INT_LIST |
                    INT_LIST_INTLEN |
                    LONG |
                    LONG_LIST |
                    IP_PORT_BYTES |
                    IP_ADDRESS |
                    BYTES PAREN_OPEN ( DEC_NUMBER | STAR ) PAREN_CLOSE |
                    STRING PAREN_OPEN ( DEC_NUMBER | STAR ) PAREN_CLOSE|
                    STRINGLIST_NLDELIM;
                    
PAYLOAD_ITEM      = PAYLOAD_ITEM_TYPE COLON IDENTIFIER_STRING;

PAYLOAD           = PAYLOAD_OPEN (PAYLOAD_ITEM (COMMA PAYLOAD_ITEM)*)? PAYLOAD_CLOSE;

CATEGORY_INFO     = CATEGORY QUOTED_STRING;

EVENT_DESCRIPTION_INFO  = EVENT_DESCRIPTION QUOTED_STRING;
PAYLOAD_DESCRIPTION_INFO  = PAYLOAD_DESCRIPTION QUOTED_STRING;

CODE_LANG           = PAREN_OPEN IDENTIFIER_STRING PAREN_CLOSE;
CODE_TEXT           = QUOTED_STRING | MULTILINE_STRING;
CODE_DEFINITION     = CODE CODE_LANG? CODE_TEXT;


EventDefinition   = SERVER_TYPE
                    CODE_TYPE
                    EVENT_TYPE
                    IDENTIFIER_STRING DEC_NUMBER
                    VERSION_TAG?
                    PAYLOAD
                    PAYLOAD_DESCRIPTION_INFO
                    EVENT_DESCRIPTION_INFO
                    CATEGORY_INFO
                    ACTIVE_STATE?
                    CODE_DEFINITION?
                    ;



On Fri, Apr 24, 2015 at 3:51 AM, Oliver Gramberg <[hidden email]> wrote:
Hi Alex,
 
it seems to me that there is a token CODE_BEGIN="{{{" or equivalent somewhere in your grammar BEFORE the "CODE_TEXT" token that you are not showing here. If I remember correctly, Grammatica has a "first defined, first matched" rule, or maybe was it "keywords before regex matches"? What happens if you remove the CODE_BEGIN token? The triple braces are part of CODE_TEXT, anyway.
 
Regards
Oliver

_______________________________________________
Grammatica-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/grammatica-users



_______________________________________________
Grammatica-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/grammatica-users