# Design of Semantic Latex

7 messages
Open this post in threaded view
|

## Design of Semantic Latex

 (email failed... retry)The referenced paper only looked at the DLMF Airy functions. The results are:"Of the 17 sections in the DLMF sample chapter on Airy functions we can handle the mathematicalformulas completely in 6, partially in 5 (without the transformation to MathML), and 6 remainincomplete. The grammar currently contains approximately 1000 productions, of which ca. 350 are dictionaries.There are about 550 rewrite rules. There are fewer rewrite rules than grammar rules, partly becausedictionaries can be treated uniformly by manipulating literals, and partly because it is still incompletewith respect to the grammar.Our project shows that parsing mathematics in the form of LATEX documents written to project-specificrules is feasible, but due to the variations in notation the grammar needs to be engineered specificallyfor the project, or even for different chapters written by different mathematicians (e.g. the chapter onelementary transcendental functions and on Airy functions)."Many have tried to parse DLMF but there is not sufficient information in the latex. This effort used many rules to try to decide if w(x+y) is a multiplication or a function application.There are rules to decide if sin a/b means (sin a)/b or sin(a/b), either of which is trivial todistinguish if the latex read {sin a}/b or sin{a/b}Trivial amounts of semantic markup in the DLMF would communicate semantics without using1000 productions which still get wrong answers. _______________________________________________ Axiom-developer mailing list [hidden email] https://lists.nongnu.org/mailman/listinfo/axiom-developer
Open this post in threaded view
|

## Re: Design of Semantic Latex

Open this post in threaded view
|

## Re: Design of Semantic Latex

 Designation of branch cuts is sometimes denoted by natural language. While the end points are specific -- depend of singularities -- the cuts can be moved for convenience, and this is done often to evaluate contour integrals, for example. Take up a book on complex analysis and see what problems you have  as you try to encode the statements, or especially the homework problems. I tried this decades ago with the text I used, https://www.amazon.com/Functions-Complex-Variable-Technique-Mathematics/dp/0898715954 but probably any other text would do. I think the emphasis on handbook or reference book representation is natural, and I have certainly pursued this direction myself.  However what you/we want to be able to encode is mathematical discourse. This goes beyond "has the algorithm reproduced the reference value for an integration."   Can you encode in semantic latex a description of the geometry of the (perhaps infinitely layered) contour of a complex function?  You might wonder if this is important, but then note that questions of this sort appear in the problem section for chapter 1. Here's the challenge then.  Take a mathematics book and "encode"  it so that a program (hypothetically) could answer the problems at the end of each chapter. You do not need special functions and integral tables to find problems that are too hard to handle.  I just found this http://news.mit.edu/2014/computer-system-automatically-solves-word-problems-0502 I think the problem, algebra word problems,  which has been addressed repeatedly since 1965 or so,  is already difficult.  While I think (judging solely by the news article -- I was unaware of this work -- which apparently used Macsyma) this is low quality,  it is hard to be sure.   Maybe their problems can be be related to your ambitions.  A quote from the article above, The system’s ability to perform fairly well even when trained chiefly on raw numerical answers is “super-encouraging,” Knight adds. “It needs a little help, but it can benefit from a bunch of extra data that you haven’t labeled in detail.” RJF _______________________________________________ Axiom-developer mailing list [hidden email] https://lists.nongnu.org/mailman/listinfo/axiom-developer
Open this post in threaded view
|

## Re: Design of Semantic Latex

 On Sat, Aug 27, 2016 at 12:14 PM, Richard Fateman wrote: Take up a book on complex analysis and see what problems you have  as you try to encode the statements, or especially the homework problems. I tried this decades ago with the text I used, https://www.amazon.com/Functions-Complex-Variable-Technique-Mathematics/dp/0898715954 but probably any other text would do.My last project at CMU (Tires) involved work on machine learning using natural language (and Good-Old-Fashioned-AI (GOFAI)). I'm not smart enough to make progress in natural language.  I think the emphasis on handbook or reference book representation is natural, and I have certainly pursued this direction myself.  However what you/we want to be able to encode is mathematical discourse. This goes beyond "has the algorithm reproduced the reference value for an integration."   Can you encode in semantic latex a description of the geometry of the (perhaps infinitely layered) contour of a complex function?  You might wonder if this is important, but then note that questions of this sort appear in the problem section for chapter 1. Like any research project, there has to be bounds on the ambition.At this point, the goal is to modify the markup to disambiguate a latexformula so the machine can import it. Axiom needs to import it to create a test suite measuring progress against existing knowledge.What you're describing seems to be a way to encode topological issuesdealing with the structure of the space underlying the formulas. I have noidea how to encode the Bloch sphere or a torus or any other space exceptby referencing an Axiom domain, which implicitly encodes it.If the formula deals with quantum mechanics then the algorithms have animplicit, mechanistic way of dealing with the Bloch sphere. So markup thatuses these function calls use this implicit grounding. Simllarly, markup thatuses a branch cut implicitly uses the implementation semantics.Axiom and Mathematics have one set of branch cuts, Maple and Maximahave another (at far as I can tell). So the markup decisions have to becarefully chosen.  Here's the challenge then.  Take a mathematics book and "encode"  it so that a program (hypothetically) could answer the problems at the end of each chapter.That's a much deeper can of worms than it appears. I spent a lot oftime in the question-answering literature. I have no idea how to makeprogress in that area. The Tires project involved self-modifying lispbased on natural language interaction with a human in the limited domain of changing a car tire. See(The grant ended before the projected ended. Sigh)TimP.S. Tires is self-modifying lisp code. It "learns" by changing itself.The initial code (the seed code) becomes "something else". One interesting insight is that two versions of the seed code will divergebased on "experience". That implies that you can't "teach by copy",that is, you can't teach one system and then "just copy" it to anotherexisting system since their experiences (and the code structure)will differ. Any system that "learns" will fail "teach by copy", I believe.That means that AI will not have the exponential growth that everyoneseems to believe. _______________________________________________ Axiom-developer mailing list [hidden email] https://lists.nongnu.org/mailman/listinfo/axiom-developer
Open this post in threaded view
|

## Re: Design of Semantic Latex

 The weaver program can now process a latex document. The end result is a tree structure of the same document.There is still more to do, of course. Much more.It is clear that the semantics of the markup tags are all inthe weaver program. This is obvious in hindsight since themarkup needs to be transparent to the print representation.The parser needs to know the 'arity' of tags since \tag{a}{b} would parse one way, \tag{a}, for a 1-arity tag and another way \tag{a}{b} for a 2-arity tag. The code needs to be generalizedto parse given the arity. The weaver program is structured so that the tree-parse outputis independent of Axiom. The Axiom rewrite will take the tree as input and produce valid Axiom inputforms. This should makeit possible to target any CAS.Onward and upward, as they say....TimOn Sat, Aug 27, 2016 at 1:28 PM, Tim Daly wrote:On Sat, Aug 27, 2016 at 12:14 PM, Richard Fateman wrote: Take up a book on complex analysis and see what problems you have  as you try to encode the statements, or especially the homework problems. I tried this decades ago with the text I used, https://www.amazon.com/Functions-Complex-Variable-Technique-Mathematics/dp/0898715954 but probably any other text would do.My last project at CMU (Tires) involved work on machine learning using natural language (and Good-Old-Fashioned-AI (GOFAI)). I'm not smart enough to make progress in natural language.  I think the emphasis on handbook or reference book representation is natural, and I have certainly pursued this direction myself.  However what you/we want to be able to encode is mathematical discourse. This goes beyond "has the algorithm reproduced the reference value for an integration."   Can you encode in semantic latex a description of the geometry of the (perhaps infinitely layered) contour of a complex function?  You might wonder if this is important, but then note that questions of this sort appear in the problem section for chapter 1. Like any research project, there has to be bounds on the ambition.At this point, the goal is to modify the markup to disambiguate a latexformula so the machine can import it. Axiom needs to import it to create a test suite measuring progress against existing knowledge.What you're describing seems to be a way to encode topological issuesdealing with the structure of the space underlying the formulas. I have noidea how to encode the Bloch sphere or a torus or any other space exceptby referencing an Axiom domain, which implicitly encodes it.If the formula deals with quantum mechanics then the algorithms have animplicit, mechanistic way of dealing with the Bloch sphere. So markup thatuses these function calls use this implicit grounding. Simllarly, markup thatuses a branch cut implicitly uses the implementation semantics.Axiom and Mathematics have one set of branch cuts, Maple and Maximahave another (at far as I can tell). So the markup decisions have to becarefully chosen.  Here's the challenge then.  Take a mathematics book and "encode"  it so that a program (hypothetically) could answer the problems at the end of each chapter.That's a much deeper can of worms than it appears. I spent a lot oftime in the question-answering literature. I have no idea how to makeprogress in that area. The Tires project involved self-modifying lispbased on natural language interaction with a human in the limited domain of changing a car tire. See(The grant ended before the projected ended. Sigh)TimP.S. Tires is self-modifying lisp code. It "learns" by changing itself.The initial code (the seed code) becomes "something else". One interesting insight is that two versions of the seed code will divergebased on "experience". That implies that you can't "teach by copy",that is, you can't teach one system and then "just copy" it to anotherexisting system since their experiences (and the code structure)will differ. Any system that "learns" will fail "teach by copy", I believe.That means that AI will not have the exponential growth that everyoneseems to believe. _______________________________________________ Axiom-developer mailing list [hidden email] https://lists.nongnu.org/mailman/listinfo/axiom-developer
 Axiom calls COQ (for Spad code) and ACL2 (for Lisp code) at build timein order to run proofs. It is hard enough trying to construct proofs by handdespite the notion that Spad is "nearly mathematical". Implementationdetails matter a lot. We do, however, have the types already available.Even with types the Spad-to-COQ gap is a long jump at best, a PhDat worst.I'm not sure how a dictionary between automated theorem provers andlatex would be useful. Fateman has already shown that raw latex lacksenough information for an unambiguous parse (see prior references).I'm suggesting that the latex be "augmented" by a semantic tags packagecontaining tags that do not change the print representation but contain additional semantic information. E.g., for Axiom, \withtype ...$\withtype{ \int{sin(x)} }{x}{EXPR(INT)}$prints as 'sin(x)' but becomesintegrate(sin(x),x)for Axiom.That way the latex prints "as is" but can be post-processed by machineto feed CAS systems. Of course this would be a trial-and-error processas the CAS will inevitably fail on missing semantics, requiring yet-another-tag somewhere. There is no such thing as a simple job.I have a trivial implementation working but there is much to do.As for trying to feed a Deep Neural Network proofs... I have spent a fair amount of time on DNNs (part of the TIRES project, see prior refs).I am clueless how they could possibly be applied to proofs.TimOn Wed, Oct 5, 2016 at 3:08 AM, James Davenport wrote: Sorry – I’ve not been much involved: other projects. But I just saw this – haven’t looked in any detail yet. DeepAlgebra - an outline of a program Authors: Przemyslaw Chojecki Categories: cs.AI math.AG Comments: 6 pages, https://przchojecki.github.io/deepalgebra/ \\   We outline a program in the area of formalization of mathematics to automate theorem proving in algebra and algebraic geometry. We propose a construction of a dictionary between automated theorem provers and (La)TeX exploiting syntactic parsers. We describe its application to a repository of human-written facts and definitions in algebraic geometry (The Stacks Project). We use deep learning techniques.   From: Tim Daly [mailto:[hidden email]] Sent: 01 September 2016 13:25 To: Richard Fateman Cc: Tim Daly; axiom-dev; Ralf Hemmecke; James Davenport; Mike Dewar; [hidden email]; D Zwillinger; [hidden email] Subject: Re: Design of Semantic Latex   The weaver program can now process a latex document. The end result is a tree structure of the same document. There is still more to do, of course. Much more. It is clear that the semantics of the markup tags are all in the weaver program. This is obvious in hindsight since the markup needs to be transparent to the print representation. The parser needs to know the 'arity' of tags since \tag{a}{b} would parse one way, \tag{a}, for a 1-arity tag and another way \tag{a}{b} for a 2-arity tag. The code needs to be generalized to parse given the arity.   The weaver program is structured so that the tree-parse output is independent of Axiom. The Axiom rewrite will take the tree as input and produce valid Axiom inputforms. This should make it possible to target any CAS. Onward and upward, as they say.... Tim   On Sat, Aug 27, 2016 at 1:28 PM, Tim Daly <[hidden email]> wrote:     On Sat, Aug 27, 2016 at 12:14 PM, Richard Fateman <[hidden email]> wrote: Take up a book on complex analysis and see what problems you have  as you try to encode the statements, or especially the homework problems. I tried this decades ago with the text I used, https://www.amazon.com/Functions-Complex-Variable-Technique-Mathematics/dp/0898715954 but probably any other text would do.   My last project at CMU (Tires) involved work on machine learning using natural language (and Good-Old-Fashioned-AI (GOFAI)). I'm not smart enough to make progress in natural language.   I think the emphasis on handbook or reference book representation is natural, and I have certainly pursued this direction myself.  However what you/we want to be able to encode is mathematical discourse. This goes beyond "has the algorithm reproduced the reference value for an integration."   Can you encode in semantic latex a description of the geometry of the (perhaps infinitely layered) contour of a complex function?  You might wonder if this is important, but then note that questions of this sort appear in the problem section for chapter 1.   Like any research project, there has to be bounds on the ambition. At this point, the goal is to modify the markup to disambiguate a latex formula so the machine can import it. Axiom needs to import it to create a test suite measuring progress against existing knowledge.   What you're describing seems to be a way to encode topological issues dealing with the structure of the space underlying the formulas. I have no idea how to encode the Bloch sphere or a torus or any other space except by referencing an Axiom domain, which implicitly encodes it.   If the formula deals with quantum mechanics then the algorithms have an implicit, mechanistic way of dealing with the Bloch sphere. So markup that uses these function calls use this implicit grounding. Simllarly, markup that uses a branch cut implicitly uses the implementation semantics. Axiom and Mathematics have one set of branch cuts, Maple and Maxima have another (at far as I can tell). So the markup decisions have to be carefully chosen.   Here's the challenge then.  Take a mathematics book and "encode"  it so that a program (hypothetically) could answer the problems at the end of each chapter.   That's a much deeper can of worms than it appears. I spent a lot of time in the question-answering literature. I have no idea how to make progress in that area. The Tires project involved self-modifying lisp based on natural language interaction with a human in the limited domain of changing a car tire. See (The grant ended before the projected ended. Sigh) Tim   P.S. Tires is self-modifying lisp code. It "learns" by changing itself. The initial code (the seed code) becomes "something else". One interesting insight is that two versions of the seed code will diverge based on "experience". That implies that you can't "teach by copy", that is, you can't teach one system and then "just copy" it to another existing system since their experiences (and the code structure) will differ. Any system that "learns" will fail "teach by copy", I believe. That means that AI will not have the exponential growth that everyone seems to believe.     _______________________________________________ Axiom-developer mailing list [hidden email] https://lists.nongnu.org/mailman/listinfo/axiom-developer