Multiline String & repeated terms

Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Multiline String & repeated terms

Alex Jeannopoulos
I am working through 2 issues, that I would like some advise on. Attached is a grammar and protocol files. Within the message are highlights of the grammar.

1. I am attempting to have a repeated term "CODE_DEFINITION", I am getting the following error when the CODE term is repeated.

[29:1] unexpected token "CODE", expected <EOF>, on line 29 column: 1


2. In my regex for a MULTILINE string a double quoted value breaks the parsing. Any thoughts how I can better support a python like multi line string, without escaping quotes within the """ block?

Thanks in advance.

CODE_LANG = PAREN_OPEN IDENTIFIER_STRING PAREN_CLOSE;

CODE_DEFINITION = CODE CODE_LANG? TEXT;


EventDefinition = SERVER_TYPE
CODE_TYPE
EVENT_TYPE
IDENTIFIER_STRING DEC_NUMBER
VERSION_TAG?
PAYLOAD
PAYLOAD_DESCRIPTION_INFO
EVENT_DESCRIPTION_INFO
CATEGORY_INFO
ACTIVE_STATE?
CODE_DEFINITION*
;


MULTILINE_STRING                        = <<(^""")([^*]|\*+[^*/])*("""$)>>
QUOTED_STRING = <<"([^"]|"")*+">>
TEXT              = QUOTED_STRING | MULTILINE_STRING;




_______________________________________________
Grammatica-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/grammatica-users

test.grammar (5K) Download Attachment
test.proto (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Multiline String & repeated terms

Alex Jeannopoulos
I solved my own issue, but I am still having a lingering issue, I am looking to support a multi line string similar to python. I have the following defined.

%tokens%
MULTILINE_STRING                                   = <<(""").*(""")>>
QUOTED_STRING                                      = <<"([^"]|"")*+">>

%productions%
TEXT              = QUOTED_STRING | MULTILINE_STRING;

It seems that all string even a multi line string with """ (3 quotes) gets matched as a QUOTED_STRING. Any ideas?


On Sun, Apr 26, 2015 at 2:45 AM, Alex Jeannopoulos <[hidden email]> wrote:
I am working through 2 issues, that I would like some advise on. Attached is a grammar and protocol files. Within the message are highlights of the grammar.

1. I am attempting to have a repeated term "CODE_DEFINITION", I am getting the following error when the CODE term is repeated.

[29:1] unexpected token "CODE", expected <EOF>, on line 29 column: 1


2. In my regex for a MULTILINE string a double quoted value breaks the parsing. Any thoughts how I can better support a python like multi line string, without escaping quotes within the """ block?

Thanks in advance.

CODE_LANG = PAREN_OPEN IDENTIFIER_STRING PAREN_CLOSE;

CODE_DEFINITION = CODE CODE_LANG? TEXT;


EventDefinition = SERVER_TYPE
CODE_TYPE
EVENT_TYPE
IDENTIFIER_STRING DEC_NUMBER
VERSION_TAG?
PAYLOAD
PAYLOAD_DESCRIPTION_INFO
EVENT_DESCRIPTION_INFO
CATEGORY_INFO
ACTIVE_STATE?
CODE_DEFINITION*
;


MULTILINE_STRING                        = <<(^""")([^*]|\*+[^*/])*("""$)>>
QUOTED_STRING = <<"([^"]|"")*+">>
TEXT              = QUOTED_STRING | MULTILINE_STRING;





_______________________________________________
Grammatica-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/grammatica-users
Reply | Threaded
Open this post in threaded view
|

Re: Multiline String & repeated terms

Per Cederberg
A quick search on stackoverflow gives you this:

That is, the problem is that the regex .* doesn't match newlines. There are lots of ways to handle this. Unfortunately there is no "dotall" flag in Grammatica though.

/Per


On Sun, Apr 26, 2015 at 9:11 PM, Alex Jeannopoulos <[hidden email]> wrote:
I solved my own issue, but I am still having a lingering issue, I am looking to support a multi line string similar to python. I have the following defined.

%tokens%
MULTILINE_STRING                                   = <<(""").*(""")>>
QUOTED_STRING                                      = <<"([^"]|"")*+">>

%productions%
TEXT              = QUOTED_STRING | MULTILINE_STRING;

It seems that all string even a multi line string with """ (3 quotes) gets matched as a QUOTED_STRING. Any ideas?


On Sun, Apr 26, 2015 at 2:45 AM, Alex Jeannopoulos <[hidden email]> wrote:
I am working through 2 issues, that I would like some advise on. Attached is a grammar and protocol files. Within the message are highlights of the grammar.

1. I am attempting to have a repeated term "CODE_DEFINITION", I am getting the following error when the CODE term is repeated.

[29:1] unexpected token "CODE", expected <EOF>, on line 29 column: 1


2. In my regex for a MULTILINE string a double quoted value breaks the parsing. Any thoughts how I can better support a python like multi line string, without escaping quotes within the """ block?

Thanks in advance.

CODE_LANG = PAREN_OPEN IDENTIFIER_STRING PAREN_CLOSE;

CODE_DEFINITION = CODE CODE_LANG? TEXT;


EventDefinition = SERVER_TYPE
CODE_TYPE
EVENT_TYPE
IDENTIFIER_STRING DEC_NUMBER
VERSION_TAG?
PAYLOAD
PAYLOAD_DESCRIPTION_INFO
EVENT_DESCRIPTION_INFO
CATEGORY_INFO
ACTIVE_STATE?
CODE_DEFINITION*
;


MULTILINE_STRING                        = <<(^""")([^*]|\*+[^*/])*("""$)>>
QUOTED_STRING = <<"([^"]|"")*+">>
TEXT              = QUOTED_STRING | MULTILINE_STRING;





_______________________________________________
Grammatica-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/grammatica-users



_______________________________________________
Grammatica-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/grammatica-users
Reply | Threaded
Open this post in threaded view
|

Re: Multiline String & repeated terms

Alex Jeannopoulos
With no "dotall" flag, I switched my regex to the following, but the MULTILINE_STRING never gets matched. It only matches QUOTED_STRING. Could you suggest a better regex to capture the token.

MULTILINE_STRING                                   = <<(""")(.*\n)*(""")>>
QUOTED_STRING                                      = <<"([^"]|"")*+">>

TEXT                                    = MULTILINE_STRING |QUOTED_STRING;

I am looking to create a python like mulitline string that could contain single and double quotes. Thanks




On Mon, Apr 27, 2015 at 3:43 AM, Per Cederberg <[hidden email]> wrote:
A quick search on stackoverflow gives you this:

That is, the problem is that the regex .* doesn't match newlines. There are lots of ways to handle this. Unfortunately there is no "dotall" flag in Grammatica though.

/Per


On Sun, Apr 26, 2015 at 9:11 PM, Alex Jeannopoulos <[hidden email]> wrote:
I solved my own issue, but I am still having a lingering issue, I am looking to support a multi line string similar to python. I have the following defined.

%tokens%
MULTILINE_STRING                                   = <<(""").*(""")>>
QUOTED_STRING                                      = <<"([^"]|"")*+">>

%productions%
TEXT              = QUOTED_STRING | MULTILINE_STRING;

It seems that all string even a multi line string with """ (3 quotes) gets matched as a QUOTED_STRING. Any ideas?


On Sun, Apr 26, 2015 at 2:45 AM, Alex Jeannopoulos <[hidden email]> wrote:
I am working through 2 issues, that I would like some advise on. Attached is a grammar and protocol files. Within the message are highlights of the grammar.

1. I am attempting to have a repeated term "CODE_DEFINITION", I am getting the following error when the CODE term is repeated.

[29:1] unexpected token "CODE", expected <EOF>, on line 29 column: 1


2. In my regex for a MULTILINE string a double quoted value breaks the parsing. Any thoughts how I can better support a python like multi line string, without escaping quotes within the """ block?

Thanks in advance.

CODE_LANG = PAREN_OPEN IDENTIFIER_STRING PAREN_CLOSE;

CODE_DEFINITION = CODE CODE_LANG? TEXT;


EventDefinition = SERVER_TYPE
CODE_TYPE
EVENT_TYPE
IDENTIFIER_STRING DEC_NUMBER
VERSION_TAG?
PAYLOAD
PAYLOAD_DESCRIPTION_INFO
EVENT_DESCRIPTION_INFO
CATEGORY_INFO
ACTIVE_STATE?
CODE_DEFINITION*
;


MULTILINE_STRING                        = <<(^""")([^*]|\*+[^*/])*("""$)>>
QUOTED_STRING = <<"([^"]|"")*+">>
TEXT              = QUOTED_STRING | MULTILINE_STRING;





_______________________________________________
Grammatica-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/grammatica-users



_______________________________________________
Grammatica-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/grammatica-users



_______________________________________________
Grammatica-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/grammatica-users
Reply | Threaded
Open this post in threaded view
|

Re: Multiline String & repeated terms

Per Cederberg
Try a regex tester to verify you regex with example input:


You can also try running only the Grammatica tokenizer to see what happens.

Some issues with your example:

1. Parenthesis are used where not needed -- makes it harder to read
2. Using .*\n instead of [\s\S] or similar suggested in the stackoverflow links
3. The * operator is greedy, so the multi-line token might become "too long"

Something like this could work better: """[\s\S]*?"""

/Per

On Mon, Apr 27, 2015 at 4:11 PM, Alex Jeannopoulos <[hidden email]> wrote:
With no "dotall" flag, I switched my regex to the following, but the MULTILINE_STRING never gets matched. It only matches QUOTED_STRING. Could you suggest a better regex to capture the token.

MULTILINE_STRING                                   = <<(""")(.*\n)*(""")>>
QUOTED_STRING                                      = <<"([^"]|"")*+">>

TEXT                                    = MULTILINE_STRING |QUOTED_STRING;

I am looking to create a python like mulitline string that could contain single and double quotes. Thanks




On Mon, Apr 27, 2015 at 3:43 AM, Per Cederberg <[hidden email]> wrote:
A quick search on stackoverflow gives you this:

That is, the problem is that the regex .* doesn't match newlines. There are lots of ways to handle this. Unfortunately there is no "dotall" flag in Grammatica though.

/Per


On Sun, Apr 26, 2015 at 9:11 PM, Alex Jeannopoulos <[hidden email]> wrote:
I solved my own issue, but I am still having a lingering issue, I am looking to support a multi line string similar to python. I have the following defined.

%tokens%
MULTILINE_STRING                                   = <<(""").*(""")>>
QUOTED_STRING                                      = <<"([^"]|"")*+">>

%productions%
TEXT              = QUOTED_STRING | MULTILINE_STRING;

It seems that all string even a multi line string with """ (3 quotes) gets matched as a QUOTED_STRING. Any ideas?


On Sun, Apr 26, 2015 at 2:45 AM, Alex Jeannopoulos <[hidden email]> wrote:
I am working through 2 issues, that I would like some advise on. Attached is a grammar and protocol files. Within the message are highlights of the grammar.

1. I am attempting to have a repeated term "CODE_DEFINITION", I am getting the following error when the CODE term is repeated.

[29:1] unexpected token "CODE", expected <EOF>, on line 29 column: 1


2. In my regex for a MULTILINE string a double quoted value breaks the parsing. Any thoughts how I can better support a python like multi line string, without escaping quotes within the """ block?

Thanks in advance.

CODE_LANG = PAREN_OPEN IDENTIFIER_STRING PAREN_CLOSE;

CODE_DEFINITION = CODE CODE_LANG? TEXT;


EventDefinition = SERVER_TYPE
CODE_TYPE
EVENT_TYPE
IDENTIFIER_STRING DEC_NUMBER
VERSION_TAG?
PAYLOAD
PAYLOAD_DESCRIPTION_INFO
EVENT_DESCRIPTION_INFO
CATEGORY_INFO
ACTIVE_STATE?
CODE_DEFINITION*
;


MULTILINE_STRING                        = <<(^""")([^*]|\*+[^*/])*("""$)>>
QUOTED_STRING = <<"([^"]|"")*+">>
TEXT              = QUOTED_STRING | MULTILINE_STRING;





_______________________________________________
Grammatica-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/grammatica-users



_______________________________________________
Grammatica-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/grammatica-users



_______________________________________________
Grammatica-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/grammatica-users



_______________________________________________
Grammatica-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/grammatica-users