How to treat nested comments?

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

How to treat nested comments?

Oliver Gramberg

Hello Grammatica users,

I want to write a parser for a language that allows nested comments:  /* ... /* ... */ ... */ is valid but /* ... /* ... */ is not. Obviously, I cannot cover that with just a regular expression. I started by defining tokens similar to the following: (don't bother with correctness here ;-)  )

NESTED_COMMENT_CONTENTS = << ... (ugly regexp matching anything except COMMENT_START or COMMENT_END) >>

One big problem with this is that NESTED_COMMENT_CONTENTS, as intended, matches anything except COMMENT_START or COMMENT_END, which can be as much as all from the current position until the end of the input file! That changes the running time from close to O(n) to something like O(n^2) - 102 sec. on a 34k input file.

Before NFAs were introduced to tokenize (Grammatica up to 1.5 alpha 2, if I'm right), my solution was to
- add an "enabled" flag to the token patterns, and
- hack the tokenizer to not match a token pattern that is not enabled,
- to keep track of the number of COMMENT_START and COMMENT_END encountered, and
- to enable NESTED_COMMENT_CONTENTS only when "inside" a comment.

Since Grammatice 1.5 release, NESTED_COMMENT_CONTENTS is being recognized by the new NFA implementation where I cannot find an easy way to disable a token pattern.

Any suggestions?


Oliver Gramberg
Forschungszentrum Deutschland
Wallstadter Str. 59
D-68526 Ladenburg

Phone: +49 6203/71-6461
Fax: +49 6203/71-6253

Sitz/Head Office: Mannheim
Registergericht/Registry Court: Mannheim
Handelsregisternummer/Commercial Register No.: HRB 4664

Vorstand/Managing Board: Peter Smits (Vorsitzender/Chairman), Heinz-Peter Paffenholz, Dr. Joachim Schneider, Hendrik Weiler
Vorsitzender des Aufsichtsrats/Chairman of Supervisory Board: Bernhard Jucker

Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail.
Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Grammatica-users mailing list
[hidden email]