HTML grammar??

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

HTML grammar??

John Kleven-2
Hi all,

Curious if anybody has used Grammatica to create an
HTML parser?

Not sure if thats a good fit for grammatica or not but
it seemed like it might be.  The existing C# HTML
parsers out there all seem to leave something (or
quite a bit) to be desired.

Any info appreciated!
Thanks
John

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 


_______________________________________________
Grammatica-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/grammatica-users
Reply | Threaded
Open this post in threaded view
|

Re: HTML grammar??

Per Cederberg
Well, I guess it would be possible to write an HTML
grammar for Grammatica. But the question is more if
it would really be a good fit. The thing with HTML
is that *lots* of the real-world web pages are
invalid (syntactically).

So I think to write a good HTML-parser, one really
needs to do it by hand. Adding special code
everywhere to recover from common problems and
issues.

Also, HTML is a very unstrict syntax, allowing new
unknown tags to be used, end tags to be omitted, etc,
etc. So it is very hard to create a correct BNF
grammar that covers all that still provides something
more than a pure tokenizer.

Cheers,

/Per

On thu, 2005-12-15 at 11:33 -0800, John Kleven wrote:

> Hi all,
>
> Curious if anybody has used Grammatica to create an
> HTML parser?
>
> Not sure if thats a good fit for grammatica or not but
> it seemed like it might be.  The existing C# HTML
> parsers out there all seem to leave something (or
> quite a bit) to be desired.
>
> Any info appreciated!
> Thanks
> John
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com 
>
>
> _______________________________________________
> Grammatica-users mailing list
> [hidden email]
> http://lists.nongnu.org/mailman/listinfo/grammatica-users
>



_______________________________________________
Grammatica-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/grammatica-users