Design Thoughts on Semantic Latex (SELATEX)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Design Thoughts on Semantic Latex (SELATEX)

Tim Daly
Fateman [0] raised a set of issues with the OpenMath
approach. We are not trying to be cross-platform in this
effort. Axiom does provide an algebraic scaffold so it is
possible that the selatex markup might be useful elsewhere
but that is not a design criterion.

Fateman[1] also raises some difficult cross-platform issues
that are not part of this design.

Fateman[2] shows that parsing tex with only syntactic markup
succeeded on only 43% of 10740 inputs. It ought to be posible
to increase this percentage given proper semantic markup.
(Perhaps there should be a competition similar to the deep
learning groups? PhDs have been awarded on incremental
improvements of the percentage)

This is a design-by-crawl approach to the semantic markup
idea. The hope is to get something running this week that
'works' but giving due consideration to global and long-term
issues. A first glance at CRC/NIST raises more questions
than answers as is usual with any research.

It IS a design goal to support a Computer Algebra Test Suite
tedious to hand construct test suites. It will be even more
tedious to construct them "second-level" by doing semantic
markup and then trying to use them as input, but the hope is
that eventually the CRC/NIST/G&R, etc will eventually be
published with semantics so computational mathematics can
stop working from syntax.

===========
Consideration 4: I/O transparency

Assume for the moment that we take a latex file containing
only formulas. We would like to be able to read this file so
it has computational mathematics (CM) semantics.

It is clear that there needs to be semantic tags that carry the
information but these tags have to be carefully designed NOT
to change the syntactic display. They may, as noted before,
require multiple semantic versions for a single syntax.

It is also clear that we would like to be able to output formulas
with CM semantics where currently we only output syntax.

===========
Consideration 5: I/O isomorphism

An important property of selatex is an isomorphism with
input/output. Axiom allows output forms to be defined for a
variety of targets so this does not seem to be a problem. For
input, however, this means that the reader has to know how
to expand \INT{3} into the correct domain. This could be done
with a stand-alone pre-processor from selatex->inputform.

It should be possible to read-then-write an selatex formula,
or write-then-read an selatex formula with identical semantics.

That might not mean that the I/O is identical though due to
things like variable ordering, etc.

===========
Consideration 6: Latex semantic macros

Semantic markup would be greatly simplified if selatex provided
a mechanism similar to Axiom's ability to define types "on the fly"
using either assignment

   TYP:=FRAC(POLY(INT))

or macro form

   TYP ==> FRAC(POLY(INT))

Latex is capable of doing this and selatex should probably include
a set of pre-defined common markups, such as

  \FRINT ==> \FRAC\INT

===========
Consideration 7: selatex \begin{semantic} environment?

Currently Axiom provides a 'chunk' environment which surrounds
source code. The chunks are named so they can be extracted
individually or in groups

   \begin{chunk}{a name for the chunk}
      anything
   \end{chunk}

We could provide a similar environment for semantics such as

  \begin{semantics}{a name for the block}
  \end{semantics}

which would provide a way to encapsulate markup and also allow
a particular block to be extracted in literate programming style.

===========
Consideration 8: Latex-time processing

Axiom currently creates specific files using \write to create
intermediate files (e.g. for tables). This technique can be used
to enhance latex-time debugging (where did it fail?).

It can be used to create Axiom files which pre-construct domains
needed when the input file with semantic markup is read.

This would help a stand-alone selatex->inputform preprocessor.

===========
Consideration 9: Design sketches

It is all well-and-good to hand-wave at this idea but a large
amount of this machinery already exists.

It would seem useful to develop an incremental test suite that
starts with "primitive" domains (e.g. INT), creating selatex I/O.

Once these are in place we could work on "type tower" markup
such as \FRAC\INT or \POLY\COMPLEX\FLOAT.

Following that might be pre-existing latex functions like \int, \sum,
\cos, etc.

To validate these ideas Axiom will include an selatex.sty file and
some unit tests files on primitive domain markup. That should be
enough to start the bikeshed discussions.

Ideas? Considerations? Suggestions?

Tim

[0] Fateman, Richard J.
"A Critique of OpenMath and Thoughts on
Encoding Mathematics, January, 2001"
https://people.eecs.berkeley.edu/~fateman/papers/openmathcrit.pdf

[1] Fateman, Richard J.
"Verbs, Nouns, and Computer Algebra, or What's Grammar Got to
[2] Fateman, Richard J.

_______________________________________________
Axiom-developer mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/axiom-developer
Reply | Threaded
Open this post in threaded view
|

Re: Design Thoughts on Semantic Latex (SELATEX)

Raymond Rogers-3

I agree that this is doable and would be useful; but I would include a built-in (or separate) lint that gives a context for troubleshooting when the 57% (or whatever) occurs. 

Ray


On 08/18/2016 02:45 PM, Tim Daly wrote:
Fateman [0] raised a set of issues with the OpenMath
approach. We are not trying to be cross-platform in this
effort. Axiom does provide an algebraic scaffold so it is
possible that the selatex markup might be useful elsewhere
but that is not a design criterion.

Fateman[1] also raises some difficult cross-platform issues
that are not part of this design.

Fateman[2] shows that parsing tex with only syntactic markup
succeeded on only 43% of 10740 inputs. It ought to be posible
to increase this percentage given proper semantic markup.
(Perhaps there should be a competition similar to the deep
learning groups? PhDs have been awarded on incremental
improvements of the percentage)

This is a design-by-crawl approach to the semantic markup
idea. The hope is to get something running this week that
'works' but giving due consideration to global and long-term
issues. A first glance at CRC/NIST raises more questions
than answers as is usual with any research.

It IS a design goal to support a Computer Algebra Test Suite
tedious to hand construct test suites. It will be even more
tedious to construct them "second-level" by doing semantic
markup and then trying to use them as input, but the hope is
that eventually the CRC/NIST/G&R, etc will eventually be
published with semantics so computational mathematics can
stop working from syntax.

===========
Consideration 4: I/O transparency

Assume for the moment that we take a latex file containing
only formulas. We would like to be able to read this file so
it has computational mathematics (CM) semantics.

It is clear that there needs to be semantic tags that carry the
information but these tags have to be carefully designed NOT
to change the syntactic display. They may, as noted before,
require multiple semantic versions for a single syntax.

It is also clear that we would like to be able to output formulas
with CM semantics where currently we only output syntax.

===========
Consideration 5: I/O isomorphism

An important property of selatex is an isomorphism with
input/output. Axiom allows output forms to be defined for a
variety of targets so this does not seem to be a problem. For
input, however, this means that the reader has to know how
to expand \INT{3} into the correct domain. This could be done
with a stand-alone pre-processor from selatex->inputform.

It should be possible to read-then-write an selatex formula,
or write-then-read an selatex formula with identical semantics.

That might not mean that the I/O is identical though due to
things like variable ordering, etc.

===========
Consideration 6: Latex semantic macros

Semantic markup would be greatly simplified if selatex provided
a mechanism similar to Axiom's ability to define types "on the fly"
using either assignment

   TYP:=FRAC(POLY(INT))

or macro form

   TYP ==> FRAC(POLY(INT))

Latex is capable of doing this and selatex should probably include
a set of pre-defined common markups, such as

  \FRINT ==> \FRAC\INT

===========
Consideration 7: selatex \begin{semantic} environment?

Currently Axiom provides a 'chunk' environment which surrounds
source code. The chunks are named so they can be extracted
individually or in groups

   \begin{chunk}{a name for the chunk}
      anything
   \end{chunk}

We could provide a similar environment for semantics such as

  \begin{semantics}{a name for the block}
  \end{semantics}

which would provide a way to encapsulate markup and also allow
a particular block to be extracted in literate programming style.

===========
Consideration 8: Latex-time processing

Axiom currently creates specific files using \write to create
intermediate files (e.g. for tables). This technique can be used
to enhance latex-time debugging (where did it fail?).

It can be used to create Axiom files which pre-construct domains
needed when the input file with semantic markup is read.

This would help a stand-alone selatex->inputform preprocessor.

===========
Consideration 9: Design sketches

It is all well-and-good to hand-wave at this idea but a large
amount of this machinery already exists.

It would seem useful to develop an incremental test suite that
starts with "primitive" domains (e.g. INT), creating selatex I/O.

Once these are in place we could work on "type tower" markup
such as \FRAC\INT or \POLY\COMPLEX\FLOAT.

Following that might be pre-existing latex functions like \int, \sum,
\cos, etc.

To validate these ideas Axiom will include an selatex.sty file and
some unit tests files on primitive domain markup. That should be
enough to start the bikeshed discussions.

Ideas? Considerations? Suggestions?

Tim

[0] Fateman, Richard J.
"A Critique of OpenMath and Thoughts on
Encoding Mathematics, January, 2001"
https://people.eecs.berkeley.edu/~fateman/papers/openmathcrit.pdf

[1] Fateman, Richard J.
"Verbs, Nouns, and Computer Algebra, or What's Grammar Got to
[2] Fateman, Richard J.


_______________________________________________
Axiom-developer mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/axiom-developer

-- 
 Two views on life:
life is an art not to be learned by observation.
George Santayana:Interpretations of Poetry and Religion
It's kinda nice to participate in your life
Raymond Rogers 

_______________________________________________
Axiom-developer mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/axiom-developer
Reply | Threaded
Open this post in threaded view
|

Re: Design Thoughts on Semantic Latex (SELATEX)

Richard Fateman-2
In reply to this post by Tim Daly
thanks for all the references :)

I'm not sure if I'm going to repeat comments I  made already somewhere.
First,  has Dan Zwillinger weighed in?  I think that it would be useful
to see what he has done.

Next, there are ambiguities among CAS and even within a single CAS.

For example, in Macsyma/ Maxima  there is generally no semantics
associated with "=" or ">".   But in some contexts, there is some meaning.

x>2*y

is a tree expression.  It is not associated with x or with y.

assume(x>2*y)   does mean something ... it puts info in a database.
Somehow encoding the method to extract this information into SEALATEX
(SeLaTeX?) in a CAS-independent way -- that's quite a task.   In
particular, it would seem to require an understanding of what assume()
does in Maxima, and what is() does also.

x and not x   has no particular meaning,  but if x is explicitly true or false,
Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
semantics, which you presumably know about and can modify.

As far as recording stuff in DLMF -- there are presumably scope issues
("in this chapter n,m are natural numbers....")  and maybe even a need
to make value assignments. 
I think you need to model these in SEALATEX too.

Just musing about where you are heading.

RJF



On 8/18/2016 11:45 AM, Tim Daly wrote:
Fateman [0] raised a set of issues with the OpenMath
approach. We are not trying to be cross-platform in this
effort. Axiom does provide an algebraic scaffold so it is
possible that the selatex markup might be useful elsewhere
but that is not a design criterion.

Fateman[1] also raises some difficult cross-platform issues
that are not part of this design.

Fateman[2] shows that parsing tex with only syntactic markup
succeeded on only 43% of 10740 inputs. It ought to be posible
to increase this percentage given proper semantic markup.
(Perhaps there should be a competition similar to the deep
learning groups? PhDs have been awarded on incremental
improvements of the percentage)

This is a design-by-crawl approach to the semantic markup
idea. The hope is to get something running this week that
'works' but giving due consideration to global and long-term
issues. A first glance at CRC/NIST raises more questions
than answers as is usual with any research.

It IS a design goal to support a Computer Algebra Test Suite
tedious to hand construct test suites. It will be even more
tedious to construct them "second-level" by doing semantic
markup and then trying to use them as input, but the hope is
that eventually the CRC/NIST/G&R, etc will eventually be
published with semantics so computational mathematics can
stop working from syntax.

===========
Consideration 4: I/O transparency

Assume for the moment that we take a latex file containing
only formulas. We would like to be able to read this file so
it has computational mathematics (CM) semantics.

It is clear that there needs to be semantic tags that carry the
information but these tags have to be carefully designed NOT
to change the syntactic display. They may, as noted before,
require multiple semantic versions for a single syntax.

It is also clear that we would like to be able to output formulas
with CM semantics where currently we only output syntax.

===========
Consideration 5: I/O isomorphism

An important property of selatex is an isomorphism with
input/output. Axiom allows output forms to be defined for a
variety of targets so this does not seem to be a problem. For
input, however, this means that the reader has to know how
to expand \INT{3} into the correct domain. This could be done
with a stand-alone pre-processor from selatex->inputform.

It should be possible to read-then-write an selatex formula,
or write-then-read an selatex formula with identical semantics.

That might not mean that the I/O is identical though due to
things like variable ordering, etc.

===========
Consideration 6: Latex semantic macros

Semantic markup would be greatly simplified if selatex provided
a mechanism similar to Axiom's ability to define types "on the fly"
using either assignment

   TYP:=FRAC(POLY(INT))

or macro form

   TYP ==> FRAC(POLY(INT))

Latex is capable of doing this and selatex should probably include
a set of pre-defined common markups, such as

  \FRINT ==> \FRAC\INT

===========
Consideration 7: selatex \begin{semantic} environment?

Currently Axiom provides a 'chunk' environment which surrounds
source code. The chunks are named so they can be extracted
individually or in groups

   \begin{chunk}{a name for the chunk}
      anything
   \end{chunk}

We could provide a similar environment for semantics such as

  \begin{semantics}{a name for the block}
  \end{semantics}

which would provide a way to encapsulate markup and also allow
a particular block to be extracted in literate programming style.

===========
Consideration 8: Latex-time processing

Axiom currently creates specific files using \write to create
intermediate files (e.g. for tables). This technique can be used
to enhance latex-time debugging (where did it fail?).

It can be used to create Axiom files which pre-construct domains
needed when the input file with semantic markup is read.

This would help a stand-alone selatex->inputform preprocessor.

===========
Consideration 9: Design sketches

It is all well-and-good to hand-wave at this idea but a large
amount of this machinery already exists.

It would seem useful to develop an incremental test suite that
starts with "primitive" domains (e.g. INT), creating selatex I/O.

Once these are in place we could work on "type tower" markup
such as \FRAC\INT or \POLY\COMPLEX\FLOAT.

Following that might be pre-existing latex functions like \int, \sum,
\cos, etc.

To validate these ideas Axiom will include an selatex.sty file and
some unit tests files on primitive domain markup. That should be
enough to start the bikeshed discussions.

Ideas? Considerations? Suggestions?

Tim

[0] Fateman, Richard J.
"A Critique of OpenMath and Thoughts on
Encoding Mathematics, January, 2001"
https://people.eecs.berkeley.edu/~fateman/papers/openmathcrit.pdf

[1] Fateman, Richard J.
"Verbs, Nouns, and Computer Algebra, or What's Grammar Got to
[2] Fateman, Richard J.


_______________________________________________
Axiom-developer mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/axiom-developer
Reply | Threaded
Open this post in threaded view
|

Re: Design Thoughts on Semantic Latex (SELATEX)

Tim Daly

One of the Axiom project goals is to develop a "Computer Algebra Test
Suite" (CATS). Albert Rich has done this with RUBI and integration. That
work is already partially in the test suite and work has been done on the
pattern matching. Large datasets (like Kamke) are always welcome. Some,
such as Schaums were hand-developed. This is tedious.

As Axiom develops more explanations and documentation it would be
useful to execute the formulas directly so there is a local incentive to be
clear about semantics.

In the long term the hope is that we can just grab formulas directly from
their sources (ala literate programming). Your work makes it plain that raw
latex does not carry sufficient semantics. There is a global question of
how to make this work. Unfortunately a general cross-platform solution
is difficult (cite Dewar/Davenport/et al. for OpenMath).

Since Axiom is literate and extracting formulas is trivial it seems that
literate markup is a natural goal. Since Axiom uses abstract algebra
as a scaffold the type tower already has a lot of axiomatic semantics.
The natural join of literate latex and abstract algebra is clearly
semantic markup, aka selatex.

===========
Consideration 10: semantic->inputform translation (weaver? :-) )

>x and not x   has no particular meaning,  but if x is explicitly true or false,
>Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
>defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
>semantics, which you presumably know about and can modify.

I am NOT defining another CAS. The goal is a "well-designed hack" using
universally understood latex, a latex package, and a translation program.

The selatex idea is only partially Axiom specific. \INT, for instance, seems
pretty generic. However, if the idea is to read formulas and disambiguate
a=b (boolean) vs a=b (equation) then the markup needs to be grounded
to have meaning. Axiom's domains (BOOLEAN) and (EQ) as the ground

\BOOLEAN(a=b)
\EQ(a=b)

are unambiguous relative to each other in Axiom. I don't know enough
about Maxima to understand how this might translate.

The extracted formulas with the decorated semantics still needs a
semantics->inputform (weaver) pre-processor which could be Maxima
specific. This would lead to debate about what "equality" means, of course.

Axiom has tried to create a first-order "rosetta stone" to translate between
systems (rosetta.pdf [1]) but it is too shallow to consider providing
cross-platform semantics.

=============
Consideration 11: \scope in selatex

>As far as recording stuff in DLMF -- there are presumably scope issues
>("in this chapter n,m are natural numbers....")  and maybe even a need
>to make value assignments. 
>I think you need to model these in SEALATEX too.

(See Consideration 6)

Clearly there are scoping issues. My current thinking is to create a
\scope markup that would manage the environment(s). This is not
a new issue (see "Lisp in Small Pieces" [0])

There seem to be three concerns.

First is the scope name, with something like 'global' as a keyword.
Second is the "closure chain" of other scopes.
Third is the symbol being scoped.

\scope{name}{chain}{symbol}

The weaver program would walk this chain to create the proper
file syntax for system input.

============
Consideration 12: System specific commands \axiom

Along with the formulas it is clear that some system specific
input may be required, such as loading files, clearing workspaces,
etc. Some of these may be done in the weaver program, such as
between formulas. Others may need to be added to the semantics
block. So a markup that provides verbatim quoting per system
might be defined, e.g.

\axiom{)clear all}  %clear the workspace

which would simply quote an input line.

==============

Note that so far all that is being suggested is transparent formula
markups which do not impact the presentation, some special tags
(\scope, \axiom,...) and a weaver program, along with the ability to
read the latex and extract named formulas (aka a literate program,
which Axiom already can do).

It ought to be possible (by design) to create a semantic version of
CRC that any system could import, assuming a "sufficiently clever
weaver".

On a more ambitious note, I am trying to find a way to keep the selatex
markup "hidden" in a pdf and use it as the clipboard paste when the
formula is selected. Anyone with a clue, please help.

===============

[0] Queinnec, Christopher, "Lisp in Small Pieces" ISBN 978-0521545662
(2003)

[1] Wester, Michael J. and Daly, TImothy "Rosetta"



On Thu, Aug 18, 2016 at 5:30 PM, Richard Fateman <[hidden email]> wrote:
thanks for all the references :)

I'm not sure if I'm going to repeat comments I  made already somewhere.
First,  has Dan Zwillinger weighed in?  I think that it would be useful
to see what he has done.

Next, there are ambiguities among CAS and even within a single CAS.

For example, in Macsyma/ Maxima  there is generally no semantics
associated with "=" or ">".   But in some contexts, there is some meaning.

x>2*y

is a tree expression.  It is not associated with x or with y.

assume(x>2*y)   does mean something ... it puts info in a database.
Somehow encoding the method to extract this information into SEALATEX
(SeLaTeX?) in a CAS-independent way -- that's quite a task.   In
particular, it would seem to require an understanding of what assume()
does in Maxima, and what is() does also.

x and not x   has no particular meaning,  but if x is explicitly true or false,
Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
semantics, which you presumably know about and can modify.

As far as recording stuff in DLMF -- there are presumably scope issues
("in this chapter n,m are natural numbers....")  and maybe even a need
to make value assignments. 
I think you need to model these in SEALATEX too.

Just musing about where you are heading.

RJF




On 8/18/2016 11:45 AM, Tim Daly wrote:
Fateman [0] raised a set of issues with the OpenMath
approach. We are not trying to be cross-platform in this
effort. Axiom does provide an algebraic scaffold so it is
possible that the selatex markup might be useful elsewhere
but that is not a design criterion.

Fateman[1] also raises some difficult cross-platform issues
that are not part of this design.

Fateman[2] shows that parsing tex with only syntactic markup
succeeded on only 43% of 10740 inputs. It ought to be posible
to increase this percentage given proper semantic markup.
(Perhaps there should be a competition similar to the deep
learning groups? PhDs have been awarded on incremental
improvements of the percentage)

This is a design-by-crawl approach to the semantic markup
idea. The hope is to get something running this week that
'works' but giving due consideration to global and long-term
issues. A first glance at CRC/NIST raises more questions
than answers as is usual with any research.

It IS a design goal to support a Computer Algebra Test Suite
tedious to hand construct test suites. It will be even more
tedious to construct them "second-level" by doing semantic
markup and then trying to use them as input, but the hope is
that eventually the CRC/NIST/G&R, etc will eventually be
published with semantics so computational mathematics can
stop working from syntax.

===========
Consideration 4: I/O transparency

Assume for the moment that we take a latex file containing
only formulas. We would like to be able to read this file so
it has computational mathematics (CM) semantics.

It is clear that there needs to be semantic tags that carry the
information but these tags have to be carefully designed NOT
to change the syntactic display. They may, as noted before,
require multiple semantic versions for a single syntax.

It is also clear that we would like to be able to output formulas
with CM semantics where currently we only output syntax.

===========
Consideration 5: I/O isomorphism

An important property of selatex is an isomorphism with
input/output. Axiom allows output forms to be defined for a
variety of targets so this does not seem to be a problem. For
input, however, this means that the reader has to know how
to expand \INT{3} into the correct domain. This could be done
with a stand-alone pre-processor from selatex->inputform.

It should be possible to read-then-write an selatex formula,
or write-then-read an selatex formula with identical semantics.

That might not mean that the I/O is identical though due to
things like variable ordering, etc.

===========
Consideration 6: Latex semantic macros

Semantic markup would be greatly simplified if selatex provided
a mechanism similar to Axiom's ability to define types "on the fly"
using either assignment

   TYP:=FRAC(POLY(INT))

or macro form

   TYP ==> FRAC(POLY(INT))

Latex is capable of doing this and selatex should probably include
a set of pre-defined common markups, such as

  \FRINT ==> \FRAC\INT

===========
Consideration 7: selatex \begin{semantic} environment?

Currently Axiom provides a 'chunk' environment which surrounds
source code. The chunks are named so they can be extracted
individually or in groups

   \begin{chunk}{a name for the chunk}
      anything
   \end{chunk}

We could provide a similar environment for semantics such as

  \begin{semantics}{a name for the block}
  \end{semantics}

which would provide a way to encapsulate markup and also allow
a particular block to be extracted in literate programming style.

===========
Consideration 8: Latex-time processing

Axiom currently creates specific files using \write to create
intermediate files (e.g. for tables). This technique can be used
to enhance latex-time debugging (where did it fail?).

It can be used to create Axiom files which pre-construct domains
needed when the input file with semantic markup is read.

This would help a stand-alone selatex->inputform preprocessor.

===========
Consideration 9: Design sketches

It is all well-and-good to hand-wave at this idea but a large
amount of this machinery already exists.

It would seem useful to develop an incremental test suite that
starts with "primitive" domains (e.g. INT), creating selatex I/O.

Once these are in place we could work on "type tower" markup
such as \FRAC\INT or \POLY\COMPLEX\FLOAT.

Following that might be pre-existing latex functions like \int, \sum,
\cos, etc.

To validate these ideas Axiom will include an selatex.sty file and
some unit tests files on primitive domain markup. That should be
enough to start the bikeshed discussions.

Ideas? Considerations? Suggestions?

Tim

[0] Fateman, Richard J.
"A Critique of OpenMath and Thoughts on
Encoding Mathematics, January, 2001"
https://people.eecs.berkeley.edu/~fateman/papers/openmathcrit.pdf

[1] Fateman, Richard J.
"Verbs, Nouns, and Computer Algebra, or What's Grammar Got to
[2] Fateman, Richard J.



_______________________________________________
Axiom-developer mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/axiom-developer
Reply | Threaded
Open this post in threaded view
|

Re: Design Thoughts on Semantic Latex (SELATEX)

Tim Daly
The game is to define latex markup that is transparent to the syntax
but adds semantics for post processing.

Some simple tests show that this works. Suppose selatex.sty contains:

\newcommand{\INT}[1]{#1}
\newcommand{\VARIABLE}[1]{#1}
\newcommand{\POLY}[1]{#1}
\newcommand{\INTEG}[2]{\int{#1}}

This defines 4 new latex markups. The number in the square brackets
defines the number of expected arguments. The brace argument
delimites the characters that will occur during expansion with the #1
replaced by the first argument.

(As an aside, INT, VARIABLE, and POLY just happen to be valid
Axiom domain abbreviations, hence the name choice. This choice
of names gives grounding to the semantics.)

Notice that \INTEG takes two arguments but will display only one.
This allows the variable of integration to be passed in the semantics
without showing up in the output. This allows the semantics to carry
additional, non-display information needed by the CAS.

Some examples follow.

An integer 3 can be wrapped as \INT{3} but will still display as 3.

A variable x can be wrapped as \VARIABLE{x}, displayed as x.

$\POLY{\INT{3}\VARIABLE{x}}$ will display as 3*x

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x} will be the same result
as $\int{3x~dx}$, that is, an
  (integralsign) 3x dx
but notice that the variable of integration is in the semantic markup.

These trivial macros can be made less verbose and certainly
more clever but that's not the point being made here.

A 'weaver' program would see the integration expression as

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x}$

with all of the semantic tags. The weaver's job is to rewrite this
expression into an inputform for the CAS. In Axiom that would be

integrate(3*x,x)

The semantics markup makes the display pretty and the semantic
parsing possible. Depending on the system, more or less parsing
markup can exist. Axiom, for example, would not need the \INT or
\VARIABLE to get a correct parse so the expression could be
$\INTEG{3x~dx}{x}$

This validates the fundamental idea.

The next step is to write a simple weaver program. The clever path
would be to embed a declarative form of the parser syntax (BNF?)
as comments in selatex.sty. That way the latex semantics and the
weaver syntax are kept in sync.

Weaver would read the BNF comments from selatex.sty and
the formula with semantic markup as input and parse the semantic
markup into inputforms. (Wish I thought of this homework problem
when I taught the compiler course :-) ).

Note that, depending on the BNF, weaver could be used to
generate output for Maxima's tree-based representation.

An alternative next step is to look at a CRC book, re-create the
syntactic latex and then create the selatex.sty entries necessary
to generate weaver input.

Infinitesimal progress, but progress non-the-less.

Tim


On Fri, Aug 19, 2016 at 12:45 AM, Tim Daly <[hidden email]> wrote:

One of the Axiom project goals is to develop a "Computer Algebra Test
Suite" (CATS). Albert Rich has done this with RUBI and integration. That
work is already partially in the test suite and work has been done on the
pattern matching. Large datasets (like Kamke) are always welcome. Some,
such as Schaums were hand-developed. This is tedious.

As Axiom develops more explanations and documentation it would be
useful to execute the formulas directly so there is a local incentive to be
clear about semantics.

In the long term the hope is that we can just grab formulas directly from
their sources (ala literate programming). Your work makes it plain that raw
latex does not carry sufficient semantics. There is a global question of
how to make this work. Unfortunately a general cross-platform solution
is difficult (cite Dewar/Davenport/et al. for OpenMath).

Since Axiom is literate and extracting formulas is trivial it seems that
literate markup is a natural goal. Since Axiom uses abstract algebra
as a scaffold the type tower already has a lot of axiomatic semantics.
The natural join of literate latex and abstract algebra is clearly
semantic markup, aka selatex.

===========
Consideration 10: semantic->inputform translation (weaver? :-) )

>x and not x   has no particular meaning,  but if x is explicitly true or false,
>Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
>defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
>semantics, which you presumably know about and can modify.

I am NOT defining another CAS. The goal is a "well-designed hack" using
universally understood latex, a latex package, and a translation program.

The selatex idea is only partially Axiom specific. \INT, for instance, seems
pretty generic. However, if the idea is to read formulas and disambiguate
a=b (boolean) vs a=b (equation) then the markup needs to be grounded
to have meaning. Axiom's domains (BOOLEAN) and (EQ) as the ground

\BOOLEAN(a=b)
\EQ(a=b)

are unambiguous relative to each other in Axiom. I don't know enough
about Maxima to understand how this might translate.

The extracted formulas with the decorated semantics still needs a
semantics->inputform (weaver) pre-processor which could be Maxima
specific. This would lead to debate about what "equality" means, of course.

Axiom has tried to create a first-order "rosetta stone" to translate between
systems (rosetta.pdf [1]) but it is too shallow to consider providing
cross-platform semantics.

=============
Consideration 11: \scope in selatex

>As far as recording stuff in DLMF -- there are presumably scope issues
>("in this chapter n,m are natural numbers....")  and maybe even a need
>to make value assignments. 
>I think you need to model these in SEALATEX too.

(See Consideration 6)

Clearly there are scoping issues. My current thinking is to create a
\scope markup that would manage the environment(s). This is not
a new issue (see "Lisp in Small Pieces" [0])

There seem to be three concerns.

First is the scope name, with something like 'global' as a keyword.
Second is the "closure chain" of other scopes.
Third is the symbol being scoped.

\scope{name}{chain}{symbol}

The weaver program would walk this chain to create the proper
file syntax for system input.

============
Consideration 12: System specific commands \axiom

Along with the formulas it is clear that some system specific
input may be required, such as loading files, clearing workspaces,
etc. Some of these may be done in the weaver program, such as
between formulas. Others may need to be added to the semantics
block. So a markup that provides verbatim quoting per system
might be defined, e.g.

\axiom{)clear all}  %clear the workspace

which would simply quote an input line.

==============

Note that so far all that is being suggested is transparent formula
markups which do not impact the presentation, some special tags
(\scope, \axiom,...) and a weaver program, along with the ability to
read the latex and extract named formulas (aka a literate program,
which Axiom already can do).

It ought to be possible (by design) to create a semantic version of
CRC that any system could import, assuming a "sufficiently clever
weaver".

On a more ambitious note, I am trying to find a way to keep the selatex
markup "hidden" in a pdf and use it as the clipboard paste when the
formula is selected. Anyone with a clue, please help.

===============

[0] Queinnec, Christopher, "Lisp in Small Pieces" ISBN 978-0521545662
(2003)

[1] Wester, Michael J. and Daly, TImothy "Rosetta"



On Thu, Aug 18, 2016 at 5:30 PM, Richard Fateman <[hidden email]> wrote:
thanks for all the references :)

I'm not sure if I'm going to repeat comments I  made already somewhere.
First,  has Dan Zwillinger weighed in?  I think that it would be useful
to see what he has done.

Next, there are ambiguities among CAS and even within a single CAS.

For example, in Macsyma/ Maxima  there is generally no semantics
associated with "=" or ">".   But in some contexts, there is some meaning.

x>2*y

is a tree expression.  It is not associated with x or with y.

assume(x>2*y)   does mean something ... it puts info in a database.
Somehow encoding the method to extract this information into SEALATEX
(SeLaTeX?) in a CAS-independent way -- that's quite a task.   In
particular, it would seem to require an understanding of what assume()
does in Maxima, and what is() does also.

x and not x   has no particular meaning,  but if x is explicitly true or false,
Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
semantics, which you presumably know about and can modify.

As far as recording stuff in DLMF -- there are presumably scope issues
("in this chapter n,m are natural numbers....")  and maybe even a need
to make value assignments. 
I think you need to model these in SEALATEX too.

Just musing about where you are heading.

RJF




On 8/18/2016 11:45 AM, Tim Daly wrote:
Fateman [0] raised a set of issues with the OpenMath
approach. We are not trying to be cross-platform in this
effort. Axiom does provide an algebraic scaffold so it is
possible that the selatex markup might be useful elsewhere
but that is not a design criterion.

Fateman[1] also raises some difficult cross-platform issues
that are not part of this design.

Fateman[2] shows that parsing tex with only syntactic markup
succeeded on only 43% of 10740 inputs. It ought to be posible
to increase this percentage given proper semantic markup.
(Perhaps there should be a competition similar to the deep
learning groups? PhDs have been awarded on incremental
improvements of the percentage)

This is a design-by-crawl approach to the semantic markup
idea. The hope is to get something running this week that
'works' but giving due consideration to global and long-term
issues. A first glance at CRC/NIST raises more questions
than answers as is usual with any research.

It IS a design goal to support a Computer Algebra Test Suite
tedious to hand construct test suites. It will be even more
tedious to construct them "second-level" by doing semantic
markup and then trying to use them as input, but the hope is
that eventually the CRC/NIST/G&R, etc will eventually be
published with semantics so computational mathematics can
stop working from syntax.

===========
Consideration 4: I/O transparency

Assume for the moment that we take a latex file containing
only formulas. We would like to be able to read this file so
it has computational mathematics (CM) semantics.

It is clear that there needs to be semantic tags that carry the
information but these tags have to be carefully designed NOT
to change the syntactic display. They may, as noted before,
require multiple semantic versions for a single syntax.

It is also clear that we would like to be able to output formulas
with CM semantics where currently we only output syntax.

===========
Consideration 5: I/O isomorphism

An important property of selatex is an isomorphism with
input/output. Axiom allows output forms to be defined for a
variety of targets so this does not seem to be a problem. For
input, however, this means that the reader has to know how
to expand \INT{3} into the correct domain. This could be done
with a stand-alone pre-processor from selatex->inputform.

It should be possible to read-then-write an selatex formula,
or write-then-read an selatex formula with identical semantics.

That might not mean that the I/O is identical though due to
things like variable ordering, etc.

===========
Consideration 6: Latex semantic macros

Semantic markup would be greatly simplified if selatex provided
a mechanism similar to Axiom's ability to define types "on the fly"
using either assignment

   TYP:=FRAC(POLY(INT))

or macro form

   TYP ==> FRAC(POLY(INT))

Latex is capable of doing this and selatex should probably include
a set of pre-defined common markups, such as

  \FRINT ==> \FRAC\INT

===========
Consideration 7: selatex \begin{semantic} environment?

Currently Axiom provides a 'chunk' environment which surrounds
source code. The chunks are named so they can be extracted
individually or in groups

   \begin{chunk}{a name for the chunk}
      anything
   \end{chunk}

We could provide a similar environment for semantics such as

  \begin{semantics}{a name for the block}
  \end{semantics}

which would provide a way to encapsulate markup and also allow
a particular block to be extracted in literate programming style.

===========
Consideration 8: Latex-time processing

Axiom currently creates specific files using \write to create
intermediate files (e.g. for tables). This technique can be used
to enhance latex-time debugging (where did it fail?).

It can be used to create Axiom files which pre-construct domains
needed when the input file with semantic markup is read.

This would help a stand-alone selatex->inputform preprocessor.

===========
Consideration 9: Design sketches

It is all well-and-good to hand-wave at this idea but a large
amount of this machinery already exists.

It would seem useful to develop an incremental test suite that
starts with "primitive" domains (e.g. INT), creating selatex I/O.

Once these are in place we could work on "type tower" markup
such as \FRAC\INT or \POLY\COMPLEX\FLOAT.

Following that might be pre-existing latex functions like \int, \sum,
\cos, etc.

To validate these ideas Axiom will include an selatex.sty file and
some unit tests files on primitive domain markup. That should be
enough to start the bikeshed discussions.

Ideas? Considerations? Suggestions?

Tim

[0] Fateman, Richard J.
"A Critique of OpenMath and Thoughts on
Encoding Mathematics, January, 2001"
https://people.eecs.berkeley.edu/~fateman/papers/openmathcrit.pdf

[1] Fateman, Richard J.
"Verbs, Nouns, and Computer Algebra, or What's Grammar Got to
[2] Fateman, Richard J.




_______________________________________________
Axiom-developer mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/axiom-developer
Reply | Threaded
Open this post in threaded view
|

Re: Design Thoughts on Semantic Latex (SELATEX)

Dan Zwillinger

All

I began reading this topic's emails when they first appeared, and then fell behind.
Sorry for my late comments.

I admire your efforts.
They may be somewhat related to what I have done in the past.
My experience is as follows:

(1) CRC SMTF (Standard Mathematical Tables and Formula)

I put the ~700 integrals in CRC's SMTF into a format from which
(A) they could be typeset in LaTeX
(B) they could be converted into Mathematica (which either did a symbolic or numeric computation) - and this was done

I let Richard Fateman use them for his experiments.


(2) Elsevier's GR (Gradshteyn and Ryzhik's "Table of Integrals, Series, and Products")

I put the ~12,000 (if my memory is correct) integrals into a format from which
(A) they could be beautifully typeset in LaTeX
(B) they could be converted into Mathematica - and this was NOT done
Enclosed is a PDF file describing the work and the resulting format.


A different format was used for SMTF and GR.
While the SMTF work was not too arduous, the GR work was more than I had anticipated.
The input (the previous version of GR) had little syntactic structure and it took much effort to get it into shape.
I used (many different) regular expressions (in perl) to translate the bulk of the book, and then lots of hand tuning.

While I think you are well beyond my thinking on these topics, please let me know if I can help.
I am friends with Howard Cohl at NIST, who may be the current lead for DLMF ("Digital Library of Mathematical Functions" at dlmf.nist.gov).
Let me know if you need an introduction.


Dan Zwillinger
[hidden email]
617-388-2382
On 8/20/2016 11:30 PM, Tim Daly wrote:
The game is to define latex markup that is transparent to the syntax
but adds semantics for post processing.

Some simple tests show that this works. Suppose selatex.sty contains:

\newcommand{\INT}[1]{#1}
\newcommand{\VARIABLE}[1]{#1}
\newcommand{\POLY}[1]{#1}
\newcommand{\INTEG}[2]{\int{#1}}

This defines 4 new latex markups. The number in the square brackets
defines the number of expected arguments. The brace argument
delimites the characters that will occur during expansion with the #1
replaced by the first argument.

(As an aside, INT, VARIABLE, and POLY just happen to be valid
Axiom domain abbreviations, hence the name choice. This choice
of names gives grounding to the semantics.)

Notice that \INTEG takes two arguments but will display only one.
This allows the variable of integration to be passed in the semantics
without showing up in the output. This allows the semantics to carry
additional, non-display information needed by the CAS.

Some examples follow.

An integer 3 can be wrapped as \INT{3} but will still display as 3.

A variable x can be wrapped as \VARIABLE{x}, displayed as x.

$\POLY{\INT{3}\VARIABLE{x}}$ will display as 3*x

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x} will be the same result
as $\int{3x~dx}$, that is, an
  (integralsign) 3x dx
but notice that the variable of integration is in the semantic markup.

These trivial macros can be made less verbose and certainly
more clever but that's not the point being made here.

A 'weaver' program would see the integration expression as

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x}$

with all of the semantic tags. The weaver's job is to rewrite this
expression into an inputform for the CAS. In Axiom that would be

integrate(3*x,x)

The semantics markup makes the display pretty and the semantic
parsing possible. Depending on the system, more or less parsing
markup can exist. Axiom, for example, would not need the \INT or
\VARIABLE to get a correct parse so the expression could be
$\INTEG{3x~dx}{x}$

This validates the fundamental idea.

The next step is to write a simple weaver program. The clever path
would be to embed a declarative form of the parser syntax (BNF?)
as comments in selatex.sty. That way the latex semantics and the
weaver syntax are kept in sync.

Weaver would read the BNF comments from selatex.sty and
the formula with semantic markup as input and parse the semantic
markup into inputforms. (Wish I thought of this homework problem
when I taught the compiler course :-) ).

Note that, depending on the BNF, weaver could be used to
generate output for Maxima's tree-based representation.

An alternative next step is to look at a CRC book, re-create the
syntactic latex and then create the selatex.sty entries necessary
to generate weaver input.

Infinitesimal progress, but progress non-the-less.

Tim


On Fri, Aug 19, 2016 at 12:45 AM, Tim Daly <[hidden email]> wrote:

One of the Axiom project goals is to develop a "Computer Algebra Test
Suite" (CATS). Albert Rich has done this with RUBI and integration. That
work is already partially in the test suite and work has been done on the
pattern matching. Large datasets (like Kamke) are always welcome. Some,
such as Schaums were hand-developed. This is tedious.

As Axiom develops more explanations and documentation it would be
useful to execute the formulas directly so there is a local incentive to be
clear about semantics.

In the long term the hope is that we can just grab formulas directly from
their sources (ala literate programming). Your work makes it plain that raw
latex does not carry sufficient semantics. There is a global question of
how to make this work. Unfortunately a general cross-platform solution
is difficult (cite Dewar/Davenport/et al. for OpenMath).

Since Axiom is literate and extracting formulas is trivial it seems that
literate markup is a natural goal. Since Axiom uses abstract algebra
as a scaffold the type tower already has a lot of axiomatic semantics.
The natural join of literate latex and abstract algebra is clearly
semantic markup, aka selatex.

===========
Consideration 10: semantic->inputform translation (weaver? :-) )

>x and not x   has no particular meaning,  but if x is explicitly true or false,
>Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
>defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
>semantics, which you presumably know about and can modify.

I am NOT defining another CAS. The goal is a "well-designed hack" using
universally understood latex, a latex package, and a translation program.

The selatex idea is only partially Axiom specific. \INT, for instance, seems
pretty generic. However, if the idea is to read formulas and disambiguate
a=b (boolean) vs a=b (equation) then the markup needs to be grounded
to have meaning. Axiom's domains (BOOLEAN) and (EQ) as the ground

\BOOLEAN(a=b)
\EQ(a=b)

are unambiguous relative to each other in Axiom. I don't know enough
about Maxima to understand how this might translate.

The extracted formulas with the decorated semantics still needs a
semantics->inputform (weaver) pre-processor which could be Maxima
specific. This would lead to debate about what "equality" means, of course.

Axiom has tried to create a first-order "rosetta stone" to translate between
systems (rosetta.pdf [1]) but it is too shallow to consider providing
cross-platform semantics.

=============
Consideration 11: \scope in selatex

>As far as recording stuff in DLMF -- there are presumably scope issues
>("in this chapter n,m are natural numbers....")  and maybe even a need
>to make value assignments. 
>I think you need to model these in SEALATEX too.

(See Consideration 6)

Clearly there are scoping issues. My current thinking is to create a
\scope markup that would manage the environment(s). This is not
a new issue (see "Lisp in Small Pieces" [0])

There seem to be three concerns.

First is the scope name, with something like 'global' as a keyword.
Second is the "closure chain" of other scopes.
Third is the symbol being scoped.

\scope{name}{chain}{symbol}

The weaver program would walk this chain to create the proper
file syntax for system input.

============
Consideration 12: System specific commands \axiom

Along with the formulas it is clear that some system specific
input may be required, such as loading files, clearing workspaces,
etc. Some of these may be done in the weaver program, such as
between formulas. Others may need to be added to the semantics
block. So a markup that provides verbatim quoting per system
might be defined, e.g.

\axiom{)clear all}  %clear the workspace

which would simply quote an input line.

==============

Note that so far all that is being suggested is transparent formula
markups which do not impact the presentation, some special tags
(\scope, \axiom,...) and a weaver program, along with the ability to
read the latex and extract named formulas (aka a literate program,
which Axiom already can do).

It ought to be possible (by design) to create a semantic version of
CRC that any system could import, assuming a "sufficiently clever
weaver".

On a more ambitious note, I am trying to find a way to keep the selatex
markup "hidden" in a pdf and use it as the clipboard paste when the
formula is selected. Anyone with a clue, please help.

===============

[0] Queinnec, Christopher, "Lisp in Small Pieces" ISBN 978-0521545662
(2003)

[1] Wester, Michael J. and Daly, TImothy "Rosetta"



On Thu, Aug 18, 2016 at 5:30 PM, Richard Fateman <[hidden email]> wrote:
thanks for all the references :)

I'm not sure if I'm going to repeat comments I  made already somewhere.
First,  has Dan Zwillinger weighed in?  I think that it would be useful
to see what he has done.

Next, there are ambiguities among CAS and even within a single CAS.

For example, in Macsyma/ Maxima  there is generally no semantics
associated with "=" or ">".   But in some contexts, there is some meaning.

x>2*y

is a tree expression.  It is not associated with x or with y.

assume(x>2*y)   does mean something ... it puts info in a database.
Somehow encoding the method to extract this information into SEALATEX
(SeLaTeX?) in a CAS-independent way -- that's quite a task.   In
particular, it would seem to require an understanding of what assume()
does in Maxima, and what is() does also.

x and not x   has no particular meaning,  but if x is explicitly true or false,
Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
semantics, which you presumably know about and can modify.

As far as recording stuff in DLMF -- there are presumably scope issues
("in this chapter n,m are natural numbers....")  and maybe even a need
to make value assignments. 
I think you need to model these in SEALATEX too.

Just musing about where you are heading.

RJF




On 8/18/2016 11:45 AM, Tim Daly wrote:
Fateman [0] raised a set of issues with the OpenMath
approach. We are not trying to be cross-platform in this
effort. Axiom does provide an algebraic scaffold so it is
possible that the selatex markup might be useful elsewhere
but that is not a design criterion.

Fateman[1] also raises some difficult cross-platform issues
that are not part of this design.

Fateman[2] shows that parsing tex with only syntactic markup
succeeded on only 43% of 10740 inputs. It ought to be posible
to increase this percentage given proper semantic markup.
(Perhaps there should be a competition similar to the deep
learning groups? PhDs have been awarded on incremental
improvements of the percentage)

This is a design-by-crawl approach to the semantic markup
idea. The hope is to get something running this week that
'works' but giving due consideration to global and long-term
issues. A first glance at CRC/NIST raises more questions
than answers as is usual with any research.

It IS a design goal to support a Computer Algebra Test Suite
tedious to hand construct test suites. It will be even more
tedious to construct them "second-level" by doing semantic
markup and then trying to use them as input, but the hope is
that eventually the CRC/NIST/G&R, etc will eventually be
published with semantics so computational mathematics can
stop working from syntax.

===========
Consideration 4: I/O transparency

Assume for the moment that we take a latex file containing
only formulas. We would like to be able to read this file so
it has computational mathematics (CM) semantics.

It is clear that there needs to be semantic tags that carry the
information but these tags have to be carefully designed NOT
to change the syntactic display. They may, as noted before,
require multiple semantic versions for a single syntax.

It is also clear that we would like to be able to output formulas
with CM semantics where currently we only output syntax.

===========
Consideration 5: I/O isomorphism

An important property of selatex is an isomorphism with
input/output. Axiom allows output forms to be defined for a
variety of targets so this does not seem to be a problem. For
input, however, this means that the reader has to know how
to expand \INT{3} into the correct domain. This could be done
with a stand-alone pre-processor from selatex->inputform.

It should be possible to read-then-write an selatex formula,
or write-then-read an selatex formula with identical semantics.

That might not mean that the I/O is identical though due to
things like variable ordering, etc.

===========
Consideration 6: Latex semantic macros

Semantic markup would be greatly simplified if selatex provided
a mechanism similar to Axiom's ability to define types "on the fly"
using either assignment

   TYP:=FRAC(POLY(INT))

or macro form

   TYP ==> FRAC(POLY(INT))

Latex is capable of doing this and selatex should probably include
a set of pre-defined common markups, such as

  \FRINT ==> \FRAC\INT

===========
Consideration 7: selatex \begin{semantic} environment?

Currently Axiom provides a 'chunk' environment which surrounds
source code. The chunks are named so they can be extracted
individually or in groups

   \begin{chunk}{a name for the chunk}
      anything
   \end{chunk}

We could provide a similar environment for semantics such as

  \begin{semantics}{a name for the block}
  \end{semantics}

which would provide a way to encapsulate markup and also allow
a particular block to be extracted in literate programming style.

===========
Consideration 8: Latex-time processing

Axiom currently creates specific files using \write to create
intermediate files (e.g. for tables). This technique can be used
to enhance latex-time debugging (where did it fail?).

It can be used to create Axiom files which pre-construct domains
needed when the input file with semantic markup is read.

This would help a stand-alone selatex->inputform preprocessor.

===========
Consideration 9: Design sketches

It is all well-and-good to hand-wave at this idea but a large
amount of this machinery already exists.

It would seem useful to develop an incremental test suite that
starts with "primitive" domains (e.g. INT), creating selatex I/O.

Once these are in place we could work on "type tower" markup
such as \FRAC\INT or \POLY\COMPLEX\FLOAT.

Following that might be pre-existing latex functions like \int, \sum,
\cos, etc.

To validate these ideas Axiom will include an selatex.sty file and
some unit tests files on primitive domain markup. That should be
enough to start the bikeshed discussions.

Ideas? Considerations? Suggestions?

Tim

[0] Fateman, Richard J.
"A Critique of OpenMath and Thoughts on
Encoding Mathematics, January, 2001"
https://people.eecs.berkeley.edu/~fateman/papers/openmathcrit.pdf

[1] Fateman, Richard J.
"Verbs, Nouns, and Computer Algebra, or What's Grammar Got to
[2] Fateman, Richard J.





_______________________________________________
Axiom-developer mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/axiom-developer

Zwillinger_GR6_essay.pdf (345K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Design Thoughts on Semantic Latex (SELATEX)

Tim Daly
Dan,

Welcome.

Re: Howard Cohl. Yes, I'd like an introduction. It seems important to
make DLMF, CRC, and other sources contain enough semantics that
they can be read by a computer algebra system. There are an
enormous number of issues, such as what to do with functions
unknown to the CAS, which need to be thought through.

I believe that NIST/CRC/G&R collections with semantic markup will
have a great normalizing effect on CAS development since it will raise
cross-platform questions like "What percent of G&R do you handle?".
Albert Rich (RUBI)[0] has been doing this for integration using patterns.
This can only benefit computational mathematics in the long term.

I've also campaigned for associating algorithms with published tables.
It is important in the long term to have reference versions. The ACM
used to do this years ago. I'd like to see a Gruntz algorithm for limits
where it can be applied, for instance. It would also provide a focus on
"missing algorithms" or edge cases. Davenport/Trager/Bronstein
algorithms promise a decision procedure but there are no existing
complete implementations. The tables could highlight missing cases,
giving focus to efforts to complete the procedure.

It will also put back-pressure on the tables to define different versions
of the same formulas based on domains (C, R, etc).


"The GR work was more than I had anticipated"... wins the award for
understatement of the decade.

The goal of this effort is to make it possible to read those
formulas directly into a CAS. Axiom is my primary target but
it should be done in a somewhat system agnostic form.

I've spent well over a year creating the computer algebra test suite.
It would be so much easier and more useful if the original sources
could be read directly.

I read your paper. There is an interesting mix of syntax and semantics.

I guess the difference in this effort is that the semantic markup is
intended to be transparent and grounded. The transparent aspect
keeps the tables unchanged in print form. The grounded aspect keeps
the semantics based on some algorithmic base. This implies, of
course, that there IS an algorithmic base which does not yet exist
for some of the work.

As you can see from the CATS work I've been trying to validate
Axiom's results with respect to published texts. This has found both
Axiom bugs and misprints in published texts.

The Kamke[1] suite was the first effort for differential equations.

The Spiegel[2] chapter 14 on indefinite integrals for integration.

The von Seggern[3] book on curves and surfaces for graphics.

The Legendre and Grazini[4] on Pasta by Design for 3D graphics.

The RUBI work on integration.

and, currently I'm re-creating the numerics that were lost when NAG
released the open source version, leaving me swimming through
Luke's[5] Algorithms book.

which, to quote a famous phrase "was more than I had anticipated".

Your Handbook of Integration[6] has a section on various known
"Caveats, How an integration result may be incorrect". This raises the
wonderful topic of branch cuts yet again. I did some testing and it
seems that Axiom and Mathematica share one set while Maple and
Maxima share another.

All of which leads to a need to create better reference materials that
are generally available (unlike the ACM algorithms for non-paying
customers) and directly useful for computational mathematics.

The current plan is to take some tables, find or re-create the latex,
invent a semantic markup, and then write the "weaver". At this point
the research is still at the "proof of concept" stage. (tex formula
sources are most welcome).

Ultimately I'd really like to see a book of formulas and algorithms
that I can just drag-and-drop into Axiom and be able to use them
without lifetimes of work.

Actually, that 's only the penultimate goal. I have augmented
Axiom to include proofs (ACL2,COQ) so I'd also like to see proofs,
(this IS mathematics, after all) but maybe we'll leave that for
next month :-)

Tim

[0] Rich, Albert "Rule-based Mathematics"
http://www.apmaths.uwo.ca/~arich/

[1] Kamke. E. "Differentialgleichungen Losungsmethoden und Losungen"
Chelsea Publishing Company, New York, 1959

[2] Spiegel, Murray R. "Mathematical Handbook", Schaum's Outline
Series; McGraw-Hill Book Company 1968

[3] von Seggern, David "CRC Standard Curves and Surfaces",
CRC Press, 1993 ISBN 0-8493-0196-3

[4] Legendre, George L. and Grazini, Stefano "Pasta by Design",
Thames and Hudson, 2001

[5] Luke, Yudell "Algorithms for the Computation of Mathematical
Functions", Academic Press, 1977 ISBN 0-12-459940-6

[6] Zwillinger, Daniel "Handbook of Integration" Jones and Bartlett,
London, 1992
 


On Sun, Aug 21, 2016 at 10:16 AM, Dan Zwillinger <[hidden email]> wrote:

All

I began reading this topic's emails when they first appeared, and then fell behind.
Sorry for my late comments.

I admire your efforts.
They may be somewhat related to what I have done in the past.
My experience is as follows:

(1) CRC SMTF (Standard Mathematical Tables and Formula)

I put the ~700 integrals in CRC's SMTF into a format from which
(A) they could be typeset in LaTeX
(B) they could be converted into Mathematica (which either did a symbolic or numeric computation) - and this was done

I let Richard Fateman use them for his experiments.


(2) Elsevier's GR (Gradshteyn and Ryzhik's "Table of Integrals, Series, and Products")

I put the ~12,000 (if my memory is correct) integrals into a format from which
(A) they could be beautifully typeset in LaTeX
(B) they could be converted into Mathematica - and this was NOT done
Enclosed is a PDF file describing the work and the resulting format.


A different format was used for SMTF and GR.
While the SMTF work was not too arduous, the GR work was more than I had anticipated.
The input (the previous version of GR) had little syntactic structure and it took much effort to get it into shape.
I used (many different) regular expressions (in perl) to translate the bulk of the book, and then lots of hand tuning.

While I think you are well beyond my thinking on these topics, please let me know if I can help.
I am friends with Howard Cohl at NIST, who may be the current lead for DLMF ("Digital Library of Mathematical Functions" at dlmf.nist.gov).
Let me know if you need an introduction.


Dan Zwillinger
[hidden email]
<a href="tel:617-388-2382" value="+16173882382" target="_blank">617-388-2382
On 8/20/2016 11:30 PM, Tim Daly wrote:
The game is to define latex markup that is transparent to the syntax
but adds semantics for post processing.

Some simple tests show that this works. Suppose selatex.sty contains:

\newcommand{\INT}[1]{#1}
\newcommand{\VARIABLE}[1]{#1}
\newcommand{\POLY}[1]{#1}
\newcommand{\INTEG}[2]{\int{#1}}

This defines 4 new latex markups. The number in the square brackets
defines the number of expected arguments. The brace argument
delimites the characters that will occur during expansion with the #1
replaced by the first argument.

(As an aside, INT, VARIABLE, and POLY just happen to be valid
Axiom domain abbreviations, hence the name choice. This choice
of names gives grounding to the semantics.)

Notice that \INTEG takes two arguments but will display only one.
This allows the variable of integration to be passed in the semantics
without showing up in the output. This allows the semantics to carry
additional, non-display information needed by the CAS.

Some examples follow.

An integer 3 can be wrapped as \INT{3} but will still display as 3.

A variable x can be wrapped as \VARIABLE{x}, displayed as x.

$\POLY{\INT{3}\VARIABLE{x}}$ will display as 3*x

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x} will be the same result
as $\int{3x~dx}$, that is, an
  (integralsign) 3x dx
but notice that the variable of integration is in the semantic markup.

These trivial macros can be made less verbose and certainly
more clever but that's not the point being made here.

A 'weaver' program would see the integration expression as

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x}$

with all of the semantic tags. The weaver's job is to rewrite this
expression into an inputform for the CAS. In Axiom that would be

integrate(3*x,x)

The semantics markup makes the display pretty and the semantic
parsing possible. Depending on the system, more or less parsing
markup can exist. Axiom, for example, would not need the \INT or
\VARIABLE to get a correct parse so the expression could be
$\INTEG{3x~dx}{x}$

This validates the fundamental idea.

The next step is to write a simple weaver program. The clever path
would be to embed a declarative form of the parser syntax (BNF?)
as comments in selatex.sty. That way the latex semantics and the
weaver syntax are kept in sync.

Weaver would read the BNF comments from selatex.sty and
the formula with semantic markup as input and parse the semantic
markup into inputforms. (Wish I thought of this homework problem
when I taught the compiler course :-) ).

Note that, depending on the BNF, weaver could be used to
generate output for Maxima's tree-based representation.

An alternative next step is to look at a CRC book, re-create the
syntactic latex and then create the selatex.sty entries necessary
to generate weaver input.

Infinitesimal progress, but progress non-the-less.

Tim


On Fri, Aug 19, 2016 at 12:45 AM, Tim Daly <[hidden email]> wrote:

One of the Axiom project goals is to develop a "Computer Algebra Test
Suite" (CATS). Albert Rich has done this with RUBI and integration. That
work is already partially in the test suite and work has been done on the
pattern matching. Large datasets (like Kamke) are always welcome. Some,
such as Schaums were hand-developed. This is tedious.

As Axiom develops more explanations and documentation it would be
useful to execute the formulas directly so there is a local incentive to be
clear about semantics.

In the long term the hope is that we can just grab formulas directly from
their sources (ala literate programming). Your work makes it plain that raw
latex does not carry sufficient semantics. There is a global question of
how to make this work. Unfortunately a general cross-platform solution
is difficult (cite Dewar/Davenport/et al. for OpenMath).

Since Axiom is literate and extracting formulas is trivial it seems that
literate markup is a natural goal. Since Axiom uses abstract algebra
as a scaffold the type tower already has a lot of axiomatic semantics.
The natural join of literate latex and abstract algebra is clearly
semantic markup, aka selatex.

===========
Consideration 10: semantic->inputform translation (weaver? :-) )

>x and not x   has no particular meaning,  but if x is explicitly true or false,
>Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
>defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
>semantics, which you presumably know about and can modify.

I am NOT defining another CAS. The goal is a "well-designed hack" using
universally understood latex, a latex package, and a translation program.

The selatex idea is only partially Axiom specific. \INT, for instance, seems
pretty generic. However, if the idea is to read formulas and disambiguate
a=b (boolean) vs a=b (equation) then the markup needs to be grounded
to have meaning. Axiom's domains (BOOLEAN) and (EQ) as the ground

\BOOLEAN(a=b)
\EQ(a=b)

are unambiguous relative to each other in Axiom. I don't know enough
about Maxima to understand how this might translate.

The extracted formulas with the decorated semantics still needs a
semantics->inputform (weaver) pre-processor which could be Maxima
specific. This would lead to debate about what "equality" means, of course.

Axiom has tried to create a first-order "rosetta stone" to translate between
systems (rosetta.pdf [1]) but it is too shallow to consider providing
cross-platform semantics.

=============
Consideration 11: \scope in selatex

>As far as recording stuff in DLMF -- there are presumably scope issues
>("in this chapter n,m are natural numbers....")  and maybe even a need
>to make value assignments. 
>I think you need to model these in SEALATEX too.

(See Consideration 6)

Clearly there are scoping issues. My current thinking is to create a
\scope markup that would manage the environment(s). This is not
a new issue (see "Lisp in Small Pieces" [0])

There seem to be three concerns.

First is the scope name, with something like 'global' as a keyword.
Second is the "closure chain" of other scopes.
Third is the symbol being scoped.

\scope{name}{chain}{symbol}

The weaver program would walk this chain to create the proper
file syntax for system input.

============
Consideration 12: System specific commands \axiom

Along with the formulas it is clear that some system specific
input may be required, such as loading files, clearing workspaces,
etc. Some of these may be done in the weaver program, such as
between formulas. Others may need to be added to the semantics
block. So a markup that provides verbatim quoting per system
might be defined, e.g.

\axiom{)clear all}  %clear the workspace

which would simply quote an input line.

==============

Note that so far all that is being suggested is transparent formula
markups which do not impact the presentation, some special tags
(\scope, \axiom,...) and a weaver program, along with the ability to
read the latex and extract named formulas (aka a literate program,
which Axiom already can do).

It ought to be possible (by design) to create a semantic version of
CRC that any system could import, assuming a "sufficiently clever
weaver".

On a more ambitious note, I am trying to find a way to keep the selatex
markup "hidden" in a pdf and use it as the clipboard paste when the
formula is selected. Anyone with a clue, please help.

===============

[0] Queinnec, Christopher, "Lisp in Small Pieces" ISBN 978-0521545662
(2003)

[1] Wester, Michael J. and Daly, TImothy "Rosetta"



On Thu, Aug 18, 2016 at 5:30 PM, Richard Fateman <[hidden email]> wrote:
thanks for all the references :)

I'm not sure if I'm going to repeat comments I  made already somewhere.
First,  has Dan Zwillinger weighed in?  I think that it would be useful
to see what he has done.

Next, there are ambiguities among CAS and even within a single CAS.

For example, in Macsyma/ Maxima  there is generally no semantics
associated with "=" or ">".   But in some contexts, there is some meaning.

x>2*y

is a tree expression.  It is not associated with x or with y.

assume(x>2*y)   does mean something ... it puts info in a database.
Somehow encoding the method to extract this information into SEALATEX
(SeLaTeX?) in a CAS-independent way -- that's quite a task.   In
particular, it would seem to require an understanding of what assume()
does in Maxima, and what is() does also.

x and not x   has no particular meaning,  but if x is explicitly true or false,
Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
semantics, which you presumably know about and can modify.

As far as recording stuff in DLMF -- there are presumably scope issues
("in this chapter n,m are natural numbers....")  and maybe even a need
to make value assignments. 
I think you need to model these in SEALATEX too.

Just musing about where you are heading.

RJF




On 8/18/2016 11:45 AM, Tim Daly wrote:
Fateman [0] raised a set of issues with the OpenMath
approach. We are not trying to be cross-platform in this
effort. Axiom does provide an algebraic scaffold so it is
possible that the selatex markup might be useful elsewhere
but that is not a design criterion.

Fateman[1] also raises some difficult cross-platform issues
that are not part of this design.

Fateman[2] shows that parsing tex with only syntactic markup
succeeded on only 43% of 10740 inputs. It ought to be posible
to increase this percentage given proper semantic markup.
(Perhaps there should be a competition similar to the deep
learning groups? PhDs have been awarded on incremental
improvements of the percentage)

This is a design-by-crawl approach to the semantic markup
idea. The hope is to get something running this week that
'works' but giving due consideration to global and long-term
issues. A first glance at CRC/NIST raises more questions
than answers as is usual with any research.

It IS a design goal to support a Computer Algebra Test Suite
tedious to hand construct test suites. It will be even more
tedious to construct them "second-level" by doing semantic
markup and then trying to use them as input, but the hope is
that eventually the CRC/NIST/G&R, etc will eventually be
published with semantics so computational mathematics can
stop working from syntax.

===========
Consideration 4: I/O transparency

Assume for the moment that we take a latex file containing
only formulas. We would like to be able to read this file so
it has computational mathematics (CM) semantics.

It is clear that there needs to be semantic tags that carry the
information but these tags have to be carefully designed NOT
to change the syntactic display. They may, as noted before,
require multiple semantic versions for a single syntax.

It is also clear that we would like to be able to output formulas
with CM semantics where currently we only output syntax.

===========
Consideration 5: I/O isomorphism

An important property of selatex is an isomorphism with
input/output. Axiom allows output forms to be defined for a
variety of targets so this does not seem to be a problem. For
input, however, this means that the reader has to know how
to expand \INT{3} into the correct domain. This could be done
with a stand-alone pre-processor from selatex->inputform.

It should be possible to read-then-write an selatex formula,
or write-then-read an selatex formula with identical semantics.

That might not mean that the I/O is identical though due to
things like variable ordering, etc.

===========
Consideration 6: Latex semantic macros

Semantic markup would be greatly simplified if selatex provided
a mechanism similar to Axiom's ability to define types "on the fly"
using either assignment

   TYP:=FRAC(POLY(INT))

or macro form

   TYP ==> FRAC(POLY(INT))

Latex is capable of doing this and selatex should probably include
a set of pre-defined common markups, such as

  \FRINT ==> \FRAC\INT

===========
Consideration 7: selatex \begin{semantic} environment?

Currently Axiom provides a 'chunk' environment which surrounds
source code. The chunks are named so they can be extracted
individually or in groups

   \begin{chunk}{a name for the chunk}
      anything
   \end{chunk}

We could provide a similar environment for semantics such as

  \begin{semantics}{a name for the block}
  \end{semantics}

which would provide a way to encapsulate markup and also allow
a particular block to be extracted in literate programming style.

===========
Consideration 8: Latex-time processing

Axiom currently creates specific files using \write to create
intermediate files (e.g. for tables). This technique can be used
to enhance latex-time debugging (where did it fail?).

It can be used to create Axiom files which pre-construct domains
needed when the input file with semantic markup is read.

This would help a stand-alone selatex->inputform preprocessor.

===========
Consideration 9: Design sketches

It is all well-and-good to hand-wave at this idea but a large
amount of this machinery already exists.

It would seem useful to develop an incremental test suite that
starts with "primitive" domains (e.g. INT), creating selatex I/O.

Once these are in place we could work on "type tower" markup
such as \FRAC\INT or \POLY\COMPLEX\FLOAT.

Following that might be pre-existing latex functions like \int, \sum,
\cos, etc.

To validate these ideas Axiom will include an selatex.sty file and
some unit tests files on primitive domain markup. That should be
enough to start the bikeshed discussions.

Ideas? Considerations? Suggestions?

Tim

[0] Fateman, Richard J.
"A Critique of OpenMath and Thoughts on
Encoding Mathematics, January, 2001"
https://people.eecs.berkeley.edu/~fateman/papers/openmathcrit.pdf

[1] Fateman, Richard J.
"Verbs, Nouns, and Computer Algebra, or What's Grammar Got to
[2] Fateman, Richard J.






_______________________________________________
Axiom-developer mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/axiom-developer
Reply | Threaded
Open this post in threaded view
|

Re: Design Thoughts on Semantic Latex (SELATEX)

Tim Daly
Like any research problem it is a struggle to get a useful grip on this.
Looking at G&R (I just ordered the latest, mine is 4th edition), the
task quickly gets out of hand.

The CATS tests in the past were created by reading the printed latex
in various volumes and hand-translating them to Axiom input.

It is not difficult to re-create the latex input for these examples.
Doing so and combining the results gives a set of examples with
matching input latex and output Axiom. The homework problem is
to write the necessary markup and weaver.

It is immediately obvious that this is more challenging than it seems.
For example, when writing y'(x)=0, Axiom needs y:=operator 'y
so it knows about the symbol as an operator. This falls under
"Consideration 12: System Specific Commands"... which implies
that the latex environment and quoting macros have to be
implemented. Sigh.

There is no such thing as a simple job.

Anyway, at least there is a way to make a proof of concept
prototype that reproduces existing results.

Tim


On Sun, Aug 21, 2016 at 4:17 PM, Tim Daly <[hidden email]> wrote:
Dan,

Welcome.

Re: Howard Cohl. Yes, I'd like an introduction. It seems important to
make DLMF, CRC, and other sources contain enough semantics that
they can be read by a computer algebra system. There are an
enormous number of issues, such as what to do with functions
unknown to the CAS, which need to be thought through.

I believe that NIST/CRC/G&R collections with semantic markup will
have a great normalizing effect on CAS development since it will raise
cross-platform questions like "What percent of G&R do you handle?".
Albert Rich (RUBI)[0] has been doing this for integration using patterns.
This can only benefit computational mathematics in the long term.

I've also campaigned for associating algorithms with published tables.
It is important in the long term to have reference versions. The ACM
used to do this years ago. I'd like to see a Gruntz algorithm for limits
where it can be applied, for instance. It would also provide a focus on
"missing algorithms" or edge cases. Davenport/Trager/Bronstein
algorithms promise a decision procedure but there are no existing
complete implementations. The tables could highlight missing cases,
giving focus to efforts to complete the procedure.

It will also put back-pressure on the tables to define different versions
of the same formulas based on domains (C, R, etc).


"The GR work was more than I had anticipated"... wins the award for
understatement of the decade.

The goal of this effort is to make it possible to read those
formulas directly into a CAS. Axiom is my primary target but
it should be done in a somewhat system agnostic form.

I've spent well over a year creating the computer algebra test suite.
It would be so much easier and more useful if the original sources
could be read directly.

I read your paper. There is an interesting mix of syntax and semantics.

I guess the difference in this effort is that the semantic markup is
intended to be transparent and grounded. The transparent aspect
keeps the tables unchanged in print form. The grounded aspect keeps
the semantics based on some algorithmic base. This implies, of
course, that there IS an algorithmic base which does not yet exist
for some of the work.

As you can see from the CATS work I've been trying to validate
Axiom's results with respect to published texts. This has found both
Axiom bugs and misprints in published texts.

The Kamke[1] suite was the first effort for differential equations.

The Spiegel[2] chapter 14 on indefinite integrals for integration.

The von Seggern[3] book on curves and surfaces for graphics.

The Legendre and Grazini[4] on Pasta by Design for 3D graphics.

The RUBI work on integration.

and, currently I'm re-creating the numerics that were lost when NAG
released the open source version, leaving me swimming through
Luke's[5] Algorithms book.

which, to quote a famous phrase "was more than I had anticipated".

Your Handbook of Integration[6] has a section on various known
"Caveats, How an integration result may be incorrect". This raises the
wonderful topic of branch cuts yet again. I did some testing and it
seems that Axiom and Mathematica share one set while Maple and
Maxima share another.

All of which leads to a need to create better reference materials that
are generally available (unlike the ACM algorithms for non-paying
customers) and directly useful for computational mathematics.

The current plan is to take some tables, find or re-create the latex,
invent a semantic markup, and then write the "weaver". At this point
the research is still at the "proof of concept" stage. (tex formula
sources are most welcome).

Ultimately I'd really like to see a book of formulas and algorithms
that I can just drag-and-drop into Axiom and be able to use them
without lifetimes of work.

Actually, that 's only the penultimate goal. I have augmented
Axiom to include proofs (ACL2,COQ) so I'd also like to see proofs,
(this IS mathematics, after all) but maybe we'll leave that for
next month :-)

Tim

[0] Rich, Albert "Rule-based Mathematics"
http://www.apmaths.uwo.ca/~arich/

[1] Kamke. E. "Differentialgleichungen Losungsmethoden und Losungen"
Chelsea Publishing Company, New York, 1959

[2] Spiegel, Murray R. "Mathematical Handbook", Schaum's Outline
Series; McGraw-Hill Book Company 1968

[3] von Seggern, David "CRC Standard Curves and Surfaces",
CRC Press, 1993 ISBN 0-8493-0196-3

[4] Legendre, George L. and Grazini, Stefano "Pasta by Design",
Thames and Hudson, 2001

[5] Luke, Yudell "Algorithms for the Computation of Mathematical
Functions", Academic Press, 1977 ISBN 0-12-459940-6

[6] Zwillinger, Daniel "Handbook of Integration" Jones and Bartlett,
London, 1992
 


On Sun, Aug 21, 2016 at 10:16 AM, Dan Zwillinger <[hidden email]> wrote:

All

I began reading this topic's emails when they first appeared, and then fell behind.
Sorry for my late comments.

I admire your efforts.
They may be somewhat related to what I have done in the past.
My experience is as follows:

(1) CRC SMTF (Standard Mathematical Tables and Formula)

I put the ~700 integrals in CRC's SMTF into a format from which
(A) they could be typeset in LaTeX
(B) they could be converted into Mathematica (which either did a symbolic or numeric computation) - and this was done

I let Richard Fateman use them for his experiments.


(2) Elsevier's GR (Gradshteyn and Ryzhik's "Table of Integrals, Series, and Products")

I put the ~12,000 (if my memory is correct) integrals into a format from which
(A) they could be beautifully typeset in LaTeX
(B) they could be converted into Mathematica - and this was NOT done
Enclosed is a PDF file describing the work and the resulting format.


A different format was used for SMTF and GR.
While the SMTF work was not too arduous, the GR work was more than I had anticipated.
The input (the previous version of GR) had little syntactic structure and it took much effort to get it into shape.
I used (many different) regular expressions (in perl) to translate the bulk of the book, and then lots of hand tuning.

While I think you are well beyond my thinking on these topics, please let me know if I can help.
I am friends with Howard Cohl at NIST, who may be the current lead for DLMF ("Digital Library of Mathematical Functions" at dlmf.nist.gov).
Let me know if you need an introduction.


Dan Zwillinger
[hidden email]
<a href="tel:617-388-2382" value="+16173882382" target="_blank">617-388-2382
On 8/20/2016 11:30 PM, Tim Daly wrote:
The game is to define latex markup that is transparent to the syntax
but adds semantics for post processing.

Some simple tests show that this works. Suppose selatex.sty contains:

\newcommand{\INT}[1]{#1}
\newcommand{\VARIABLE}[1]{#1}
\newcommand{\POLY}[1]{#1}
\newcommand{\INTEG}[2]{\int{#1}}

This defines 4 new latex markups. The number in the square brackets
defines the number of expected arguments. The brace argument
delimites the characters that will occur during expansion with the #1
replaced by the first argument.

(As an aside, INT, VARIABLE, and POLY just happen to be valid
Axiom domain abbreviations, hence the name choice. This choice
of names gives grounding to the semantics.)

Notice that \INTEG takes two arguments but will display only one.
This allows the variable of integration to be passed in the semantics
without showing up in the output. This allows the semantics to carry
additional, non-display information needed by the CAS.

Some examples follow.

An integer 3 can be wrapped as \INT{3} but will still display as 3.

A variable x can be wrapped as \VARIABLE{x}, displayed as x.

$\POLY{\INT{3}\VARIABLE{x}}$ will display as 3*x

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x} will be the same result
as $\int{3x~dx}$, that is, an
  (integralsign) 3x dx
but notice that the variable of integration is in the semantic markup.

These trivial macros can be made less verbose and certainly
more clever but that's not the point being made here.

A 'weaver' program would see the integration expression as

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x}$

with all of the semantic tags. The weaver's job is to rewrite this
expression into an inputform for the CAS. In Axiom that would be

integrate(3*x,x)

The semantics markup makes the display pretty and the semantic
parsing possible. Depending on the system, more or less parsing
markup can exist. Axiom, for example, would not need the \INT or
\VARIABLE to get a correct parse so the expression could be
$\INTEG{3x~dx}{x}$

This validates the fundamental idea.

The next step is to write a simple weaver program. The clever path
would be to embed a declarative form of the parser syntax (BNF?)
as comments in selatex.sty. That way the latex semantics and the
weaver syntax are kept in sync.

Weaver would read the BNF comments from selatex.sty and
the formula with semantic markup as input and parse the semantic
markup into inputforms. (Wish I thought of this homework problem
when I taught the compiler course :-) ).

Note that, depending on the BNF, weaver could be used to
generate output for Maxima's tree-based representation.

An alternative next step is to look at a CRC book, re-create the
syntactic latex and then create the selatex.sty entries necessary
to generate weaver input.

Infinitesimal progress, but progress non-the-less.

Tim


On Fri, Aug 19, 2016 at 12:45 AM, Tim Daly <[hidden email]> wrote:

One of the Axiom project goals is to develop a "Computer Algebra Test
Suite" (CATS). Albert Rich has done this with RUBI and integration. That
work is already partially in the test suite and work has been done on the
pattern matching. Large datasets (like Kamke) are always welcome. Some,
such as Schaums were hand-developed. This is tedious.

As Axiom develops more explanations and documentation it would be
useful to execute the formulas directly so there is a local incentive to be
clear about semantics.

In the long term the hope is that we can just grab formulas directly from
their sources (ala literate programming). Your work makes it plain that raw
latex does not carry sufficient semantics. There is a global question of
how to make this work. Unfortunately a general cross-platform solution
is difficult (cite Dewar/Davenport/et al. for OpenMath).

Since Axiom is literate and extracting formulas is trivial it seems that
literate markup is a natural goal. Since Axiom uses abstract algebra
as a scaffold the type tower already has a lot of axiomatic semantics.
The natural join of literate latex and abstract algebra is clearly
semantic markup, aka selatex.

===========
Consideration 10: semantic->inputform translation (weaver? :-) )

>x and not x   has no particular meaning,  but if x is explicitly true or false,
>Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
>defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
>semantics, which you presumably know about and can modify.

I am NOT defining another CAS. The goal is a "well-designed hack" using
universally understood latex, a latex package, and a translation program.

The selatex idea is only partially Axiom specific. \INT, for instance, seems
pretty generic. However, if the idea is to read formulas and disambiguate
a=b (boolean) vs a=b (equation) then the markup needs to be grounded
to have meaning. Axiom's domains (BOOLEAN) and (EQ) as the ground

\BOOLEAN(a=b)
\EQ(a=b)

are unambiguous relative to each other in Axiom. I don't know enough
about Maxima to understand how this might translate.

The extracted formulas with the decorated semantics still needs a
semantics->inputform (weaver) pre-processor which could be Maxima
specific. This would lead to debate about what "equality" means, of course.

Axiom has tried to create a first-order "rosetta stone" to translate between
systems (rosetta.pdf [1]) but it is too shallow to consider providing
cross-platform semantics.

=============
Consideration 11: \scope in selatex

>As far as recording stuff in DLMF -- there are presumably scope issues
>("in this chapter n,m are natural numbers....")  and maybe even a need
>to make value assignments. 
>I think you need to model these in SEALATEX too.

(See Consideration 6)

Clearly there are scoping issues. My current thinking is to create a
\scope markup that would manage the environment(s). This is not
a new issue (see "Lisp in Small Pieces" [0])

There seem to be three concerns.

First is the scope name, with something like 'global' as a keyword.
Second is the "closure chain" of other scopes.
Third is the symbol being scoped.

\scope{name}{chain}{symbol}

The weaver program would walk this chain to create the proper
file syntax for system input.

============
Consideration 12: System specific commands \axiom

Along with the formulas it is clear that some system specific
input may be required, such as loading files, clearing workspaces,
etc. Some of these may be done in the weaver program, such as
between formulas. Others may need to be added to the semantics
block. So a markup that provides verbatim quoting per system
might be defined, e.g.

\axiom{)clear all}  %clear the workspace

which would simply quote an input line.

==============

Note that so far all that is being suggested is transparent formula
markups which do not impact the presentation, some special tags
(\scope, \axiom,...) and a weaver program, along with the ability to
read the latex and extract named formulas (aka a literate program,
which Axiom already can do).

It ought to be possible (by design) to create a semantic version of
CRC that any system could import, assuming a "sufficiently clever
weaver".

On a more ambitious note, I am trying to find a way to keep the selatex
markup "hidden" in a pdf and use it as the clipboard paste when the
formula is selected. Anyone with a clue, please help.

===============

[0] Queinnec, Christopher, "Lisp in Small Pieces" ISBN 978-0521545662
(2003)

[1] Wester, Michael J. and Daly, TImothy "Rosetta"



On Thu, Aug 18, 2016 at 5:30 PM, Richard Fateman <[hidden email]> wrote:
thanks for all the references :)

I'm not sure if I'm going to repeat comments I  made already somewhere.
First,  has Dan Zwillinger weighed in?  I think that it would be useful
to see what he has done.

Next, there are ambiguities among CAS and even within a single CAS.

For example, in Macsyma/ Maxima  there is generally no semantics
associated with "=" or ">".   But in some contexts, there is some meaning.

x>2*y

is a tree expression.  It is not associated with x or with y.

assume(x>2*y)   does mean something ... it puts info in a database.
Somehow encoding the method to extract this information into SEALATEX
(SeLaTeX?) in a CAS-independent way -- that's quite a task.   In
particular, it would seem to require an understanding of what assume()
does in Maxima, and what is() does also.

x and not x   has no particular meaning,  but if x is explicitly true or false,
Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
semantics, which you presumably know about and can modify.

As far as recording stuff in DLMF -- there are presumably scope issues
("in this chapter n,m are natural numbers....")  and maybe even a need
to make value assignments. 
I think you need to model these in SEALATEX too.

Just musing about where you are heading.

RJF




On 8/18/2016 11:45 AM, Tim Daly wrote:
Fateman [0] raised a set of issues with the OpenMath
approach. We are not trying to be cross-platform in this
effort. Axiom does provide an algebraic scaffold so it is
possible that the selatex markup might be useful elsewhere
but that is not a design criterion.

Fateman[1] also raises some difficult cross-platform issues
that are not part of this design.

Fateman[2] shows that parsing tex with only syntactic markup
succeeded on only 43% of 10740 inputs. It ought to be posible
to increase this percentage given proper semantic markup.
(Perhaps there should be a competition similar to the deep
learning groups? PhDs have been awarded on incremental
improvements of the percentage)

This is a design-by-crawl approach to the semantic markup
idea. The hope is to get something running this week that
'works' but giving due consideration to global and long-term
issues. A first glance at CRC/NIST raises more questions
than answers as is usual with any research.

It IS a design goal to support a Computer Algebra Test Suite
tedious to hand construct test suites. It will be even more
tedious to construct them "second-level" by doing semantic
markup and then trying to use them as input, but the hope is
that eventually the CRC/NIST/G&R, etc will eventually be
published with semantics so computational mathematics can
stop working from syntax.

===========
Consideration 4: I/O transparency

Assume for the moment that we take a latex file containing
only formulas. We would like to be able to read this file so
it has computational mathematics (CM) semantics.

It is clear that there needs to be semantic tags that carry the
information but these tags have to be carefully designed NOT
to change the syntactic display. They may, as noted before,
require multiple semantic versions for a single syntax.

It is also clear that we would like to be able to output formulas
with CM semantics where currently we only output syntax.

===========
Consideration 5: I/O isomorphism

An important property of selatex is an isomorphism with
input/output. Axiom allows output forms to be defined for a
variety of targets so this does not seem to be a problem. For
input, however, this means that the reader has to know how
to expand \INT{3} into the correct domain. This could be done
with a stand-alone pre-processor from selatex->inputform.

It should be possible to read-then-write an selatex formula,
or write-then-read an selatex formula with identical semantics.

That might not mean that the I/O is identical though due to
things like variable ordering, etc.

===========
Consideration 6: Latex semantic macros

Semantic markup would be greatly simplified if selatex provided
a mechanism similar to Axiom's ability to define types "on the fly"
using either assignment

   TYP:=FRAC(POLY(INT))

or macro form

   TYP ==> FRAC(POLY(INT))

Latex is capable of doing this and selatex should probably include
a set of pre-defined common markups, such as

  \FRINT ==> \FRAC\INT

===========
Consideration 7: selatex \begin{semantic} environment?

Currently Axiom provides a 'chunk' environment which surrounds
source code. The chunks are named so they can be extracted
individually or in groups

   \begin{chunk}{a name for the chunk}
      anything
   \end{chunk}

We could provide a similar environment for semantics such as

  \begin{semantics}{a name for the block}
  \end{semantics}

which would provide a way to encapsulate markup and also allow
a particular block to be extracted in literate programming style.

===========
Consideration 8: Latex-time processing

Axiom currently creates specific files using \write to create
intermediate files (e.g. for tables). This technique can be used
to enhance latex-time debugging (where did it fail?).

It can be used to create Axiom files which pre-construct domains
needed when the input file with semantic markup is read.

This would help a stand-alone selatex->inputform preprocessor.

===========
Consideration 9: Design sketches

It is all well-and-good to hand-wave at this idea but a large
amount of this machinery already exists.

It would seem useful to develop an incremental test suite that
starts with "primitive" domains (e.g. INT), creating selatex I/O.

Once these are in place we could work on "type tower" markup
such as \FRAC\INT or \POLY\COMPLEX\FLOAT.

Following that might be pre-existing latex functions like \int, \sum,
\cos, etc.

To validate these ideas Axiom will include an selatex.sty file and
some unit tests files on primitive domain markup. That should be
enough to start the bikeshed discussions.

Ideas? Considerations? Suggestions?

Tim

[0] Fateman, Richard J.
"A Critique of OpenMath and Thoughts on
Encoding Mathematics, January, 2001"
https://people.eecs.berkeley.edu/~fateman/papers/openmathcrit.pdf

[1] Fateman, Richard J.
"Verbs, Nouns, and Computer Algebra, or What's Grammar Got to
[2] Fateman, Richard J.







_______________________________________________
Axiom-developer mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/axiom-developer
Reply | Threaded
Open this post in threaded view
|

Re: Design Thoughts on Semantic Latex (SELATEX)

Tim Daly
Dan,

While paging through the CRC 31st Standard Mathematical
Tables I landed on page 219, section 3.4.1.2 Graph Invariants.

It would be a vast improvement if there were algorithms
associated with these invariants. Clearly they exist somewhere.

To "cross the gap" between tables and computational mathematics
it would be valuable to include implementations of these invariants.

It is hard to walk away from that page. An Axiom implementation
would be fun to write, especially given the next section that lists
different kinds of graphs which, presumably, would all have the
invariants. Even better, the graph algorithms are likely good
candidates for proof technology (ACL2 if done in Lisp, COQ if
done in Spad). Lisp has the advantage of an ANSI standard.

It seems worthwhile to take sections like this, expand them
across computational and proof tools, and publish them in a
form that is generally useful. It is "nice to know" that a graph
has a radius but it would be even better if I could "just point and
click" to import the algorithm.

Axiom has been pushing literate programming for years. The
tools exist to "make it so", as the saying goes.

Tim




On Sun, Aug 21, 2016 at 10:40 PM, Tim Daly <[hidden email]> wrote:
Like any research problem it is a struggle to get a useful grip on this.
Looking at G&R (I just ordered the latest, mine is 4th edition), the
task quickly gets out of hand.

The CATS tests in the past were created by reading the printed latex
in various volumes and hand-translating them to Axiom input.

It is not difficult to re-create the latex input for these examples.
Doing so and combining the results gives a set of examples with
matching input latex and output Axiom. The homework problem is
to write the necessary markup and weaver.

It is immediately obvious that this is more challenging than it seems.
For example, when writing y'(x)=0, Axiom needs y:=operator 'y
so it knows about the symbol as an operator. This falls under
"Consideration 12: System Specific Commands"... which implies
that the latex environment and quoting macros have to be
implemented. Sigh.

There is no such thing as a simple job.

Anyway, at least there is a way to make a proof of concept
prototype that reproduces existing results.

Tim


On Sun, Aug 21, 2016 at 4:17 PM, Tim Daly <[hidden email]> wrote:
Dan,

Welcome.

Re: Howard Cohl. Yes, I'd like an introduction. It seems important to
make DLMF, CRC, and other sources contain enough semantics that
they can be read by a computer algebra system. There are an
enormous number of issues, such as what to do with functions
unknown to the CAS, which need to be thought through.

I believe that NIST/CRC/G&R collections with semantic markup will
have a great normalizing effect on CAS development since it will raise
cross-platform questions like "What percent of G&R do you handle?".
Albert Rich (RUBI)[0] has been doing this for integration using patterns.
This can only benefit computational mathematics in the long term.

I've also campaigned for associating algorithms with published tables.
It is important in the long term to have reference versions. The ACM
used to do this years ago. I'd like to see a Gruntz algorithm for limits
where it can be applied, for instance. It would also provide a focus on
"missing algorithms" or edge cases. Davenport/Trager/Bronstein
algorithms promise a decision procedure but there are no existing
complete implementations. The tables could highlight missing cases,
giving focus to efforts to complete the procedure.

It will also put back-pressure on the tables to define different versions
of the same formulas based on domains (C, R, etc).


"The GR work was more than I had anticipated"... wins the award for
understatement of the decade.

The goal of this effort is to make it possible to read those
formulas directly into a CAS. Axiom is my primary target but
it should be done in a somewhat system agnostic form.

I've spent well over a year creating the computer algebra test suite.
It would be so much easier and more useful if the original sources
could be read directly.

I read your paper. There is an interesting mix of syntax and semantics.

I guess the difference in this effort is that the semantic markup is
intended to be transparent and grounded. The transparent aspect
keeps the tables unchanged in print form. The grounded aspect keeps
the semantics based on some algorithmic base. This implies, of
course, that there IS an algorithmic base which does not yet exist
for some of the work.

As you can see from the CATS work I've been trying to validate
Axiom's results with respect to published texts. This has found both
Axiom bugs and misprints in published texts.

The Kamke[1] suite was the first effort for differential equations.

The Spiegel[2] chapter 14 on indefinite integrals for integration.

The von Seggern[3] book on curves and surfaces for graphics.

The Legendre and Grazini[4] on Pasta by Design for 3D graphics.

The RUBI work on integration.

and, currently I'm re-creating the numerics that were lost when NAG
released the open source version, leaving me swimming through
Luke's[5] Algorithms book.

which, to quote a famous phrase "was more than I had anticipated".

Your Handbook of Integration[6] has a section on various known
"Caveats, How an integration result may be incorrect". This raises the
wonderful topic of branch cuts yet again. I did some testing and it
seems that Axiom and Mathematica share one set while Maple and
Maxima share another.

All of which leads to a need to create better reference materials that
are generally available (unlike the ACM algorithms for non-paying
customers) and directly useful for computational mathematics.

The current plan is to take some tables, find or re-create the latex,
invent a semantic markup, and then write the "weaver". At this point
the research is still at the "proof of concept" stage. (tex formula
sources are most welcome).

Ultimately I'd really like to see a book of formulas and algorithms
that I can just drag-and-drop into Axiom and be able to use them
without lifetimes of work.

Actually, that 's only the penultimate goal. I have augmented
Axiom to include proofs (ACL2,COQ) so I'd also like to see proofs,
(this IS mathematics, after all) but maybe we'll leave that for
next month :-)

Tim

[0] Rich, Albert "Rule-based Mathematics"
http://www.apmaths.uwo.ca/~arich/

[1] Kamke. E. "Differentialgleichungen Losungsmethoden und Losungen"
Chelsea Publishing Company, New York, 1959

[2] Spiegel, Murray R. "Mathematical Handbook", Schaum's Outline
Series; McGraw-Hill Book Company 1968

[3] von Seggern, David "CRC Standard Curves and Surfaces",
CRC Press, 1993 ISBN 0-8493-0196-3

[4] Legendre, George L. and Grazini, Stefano "Pasta by Design",
Thames and Hudson, 2001

[5] Luke, Yudell "Algorithms for the Computation of Mathematical
Functions", Academic Press, 1977 ISBN 0-12-459940-6

[6] Zwillinger, Daniel "Handbook of Integration" Jones and Bartlett,
London, 1992
 


On Sun, Aug 21, 2016 at 10:16 AM, Dan Zwillinger <[hidden email]> wrote:

All

I began reading this topic's emails when they first appeared, and then fell behind.
Sorry for my late comments.

I admire your efforts.
They may be somewhat related to what I have done in the past.
My experience is as follows:

(1) CRC SMTF (Standard Mathematical Tables and Formula)

I put the ~700 integrals in CRC's SMTF into a format from which
(A) they could be typeset in LaTeX
(B) they could be converted into Mathematica (which either did a symbolic or numeric computation) - and this was done

I let Richard Fateman use them for his experiments.


(2) Elsevier's GR (Gradshteyn and Ryzhik's "Table of Integrals, Series, and Products")

I put the ~12,000 (if my memory is correct) integrals into a format from which
(A) they could be beautifully typeset in LaTeX
(B) they could be converted into Mathematica - and this was NOT done
Enclosed is a PDF file describing the work and the resulting format.


A different format was used for SMTF and GR.
While the SMTF work was not too arduous, the GR work was more than I had anticipated.
The input (the previous version of GR) had little syntactic structure and it took much effort to get it into shape.
I used (many different) regular expressions (in perl) to translate the bulk of the book, and then lots of hand tuning.

While I think you are well beyond my thinking on these topics, please let me know if I can help.
I am friends with Howard Cohl at NIST, who may be the current lead for DLMF ("Digital Library of Mathematical Functions" at dlmf.nist.gov).
Let me know if you need an introduction.


Dan Zwillinger
[hidden email]
<a href="tel:617-388-2382" value="+16173882382" target="_blank">617-388-2382
On 8/20/2016 11:30 PM, Tim Daly wrote:
The game is to define latex markup that is transparent to the syntax
but adds semantics for post processing.

Some simple tests show that this works. Suppose selatex.sty contains:

\newcommand{\INT}[1]{#1}
\newcommand{\VARIABLE}[1]{#1}
\newcommand{\POLY}[1]{#1}
\newcommand{\INTEG}[2]{\int{#1}}

This defines 4 new latex markups. The number in the square brackets
defines the number of expected arguments. The brace argument
delimites the characters that will occur during expansion with the #1
replaced by the first argument.

(As an aside, INT, VARIABLE, and POLY just happen to be valid
Axiom domain abbreviations, hence the name choice. This choice
of names gives grounding to the semantics.)

Notice that \INTEG takes two arguments but will display only one.
This allows the variable of integration to be passed in the semantics
without showing up in the output. This allows the semantics to carry
additional, non-display information needed by the CAS.

Some examples follow.

An integer 3 can be wrapped as \INT{3} but will still display as 3.

A variable x can be wrapped as \VARIABLE{x}, displayed as x.

$\POLY{\INT{3}\VARIABLE{x}}$ will display as 3*x

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x} will be the same result
as $\int{3x~dx}$, that is, an
  (integralsign) 3x dx
but notice that the variable of integration is in the semantic markup.

These trivial macros can be made less verbose and certainly
more clever but that's not the point being made here.

A 'weaver' program would see the integration expression as

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x}$

with all of the semantic tags. The weaver's job is to rewrite this
expression into an inputform for the CAS. In Axiom that would be

integrate(3*x,x)

The semantics markup makes the display pretty and the semantic
parsing possible. Depending on the system, more or less parsing
markup can exist. Axiom, for example, would not need the \INT or
\VARIABLE to get a correct parse so the expression could be
$\INTEG{3x~dx}{x}$

This validates the fundamental idea.

The next step is to write a simple weaver program. The clever path
would be to embed a declarative form of the parser syntax (BNF?)
as comments in selatex.sty. That way the latex semantics and the
weaver syntax are kept in sync.

Weaver would read the BNF comments from selatex.sty and
the formula with semantic markup as input and parse the semantic
markup into inputforms. (Wish I thought of this homework problem
when I taught the compiler course :-) ).

Note that, depending on the BNF, weaver could be used to
generate output for Maxima's tree-based representation.

An alternative next step is to look at a CRC book, re-create the
syntactic latex and then create the selatex.sty entries necessary
to generate weaver input.

Infinitesimal progress, but progress non-the-less.

Tim


On Fri, Aug 19, 2016 at 12:45 AM, Tim Daly <[hidden email]> wrote:

One of the Axiom project goals is to develop a "Computer Algebra Test
Suite" (CATS). Albert Rich has done this with RUBI and integration. That
work is already partially in the test suite and work has been done on the
pattern matching. Large datasets (like Kamke) are always welcome. Some,
such as Schaums were hand-developed. This is tedious.

As Axiom develops more explanations and documentation it would be
useful to execute the formulas directly so there is a local incentive to be
clear about semantics.

In the long term the hope is that we can just grab formulas directly from
their sources (ala literate programming). Your work makes it plain that raw
latex does not carry sufficient semantics. There is a global question of
how to make this work. Unfortunately a general cross-platform solution
is difficult (cite Dewar/Davenport/et al. for OpenMath).

Since Axiom is literate and extracting formulas is trivial it seems that
literate markup is a natural goal. Since Axiom uses abstract algebra
as a scaffold the type tower already has a lot of axiomatic semantics.
The natural join of literate latex and abstract algebra is clearly
semantic markup, aka selatex.

===========
Consideration 10: semantic->inputform translation (weaver? :-) )

>x and not x   has no particular meaning,  but if x is explicitly true or false,
>Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
>defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
>semantics, which you presumably know about and can modify.

I am NOT defining another CAS. The goal is a "well-designed hack" using
universally understood latex, a latex package, and a translation program.

The selatex idea is only partially Axiom specific. \INT, for instance, seems
pretty generic. However, if the idea is to read formulas and disambiguate
a=b (boolean) vs a=b (equation) then the markup needs to be grounded
to have meaning. Axiom's domains (BOOLEAN) and (EQ) as the ground

\BOOLEAN(a=b)
\EQ(a=b)

are unambiguous relative to each other in Axiom. I don't know enough
about Maxima to understand how this might translate.

The extracted formulas with the decorated semantics still needs a
semantics->inputform (weaver) pre-processor which could be Maxima
specific. This would lead to debate about what "equality" means, of course.

Axiom has tried to create a first-order "rosetta stone" to translate between
systems (rosetta.pdf [1]) but it is too shallow to consider providing
cross-platform semantics.

=============
Consideration 11: \scope in selatex

>As far as recording stuff in DLMF -- there are presumably scope issues
>("in this chapter n,m are natural numbers....")  and maybe even a need
>to make value assignments. 
>I think you need to model these in SEALATEX too.

(See Consideration 6)

Clearly there are scoping issues. My current thinking is to create a
\scope markup that would manage the environment(s). This is not
a new issue (see "Lisp in Small Pieces" [0])

There seem to be three concerns.

First is the scope name, with something like 'global' as a keyword.
Second is the "closure chain" of other scopes.
Third is the symbol being scoped.

\scope{name}{chain}{symbol}

The weaver program would walk this chain to create the proper
file syntax for system input.

============
Consideration 12: System specific commands \axiom

Along with the formulas it is clear that some system specific
input may be required, such as loading files, clearing workspaces,
etc. Some of these may be done in the weaver program, such as
between formulas. Others may need to be added to the semantics
block. So a markup that provides verbatim quoting per system
might be defined, e.g.

\axiom{)clear all}  %clear the workspace

which would simply quote an input line.

==============

Note that so far all that is being suggested is transparent formula
markups which do not impact the presentation, some special tags
(\scope, \axiom,...) and a weaver program, along with the ability to
read the latex and extract named formulas (aka a literate program,
which Axiom already can do).

It ought to be possible (by design) to create a semantic version of
CRC that any system could import, assuming a "sufficiently clever
weaver".

On a more ambitious note, I am trying to find a way to keep the selatex
markup "hidden" in a pdf and use it as the clipboard paste when the
formula is selected. Anyone with a clue, please help.

===============

[0] Queinnec, Christopher, "Lisp in Small Pieces" ISBN 978-0521545662
(2003)

[1] Wester, Michael J. and Daly, TImothy "Rosetta"



On Thu, Aug 18, 2016 at 5:30 PM, Richard Fateman <[hidden email]> wrote:
thanks for all the references :)

I'm not sure if I'm going to repeat comments I  made already somewhere.
First,  has Dan Zwillinger weighed in?  I think that it would be useful
to see what he has done.

Next, there are ambiguities among CAS and even within a single CAS.

For example, in Macsyma/ Maxima  there is generally no semantics
associated with "=" or ">".   But in some contexts, there is some meaning.

x>2*y

is a tree expression.  It is not associated with x or with y.

assume(x>2*y)   does mean something ... it puts info in a database.
Somehow encoding the method to extract this information into SEALATEX
(SeLaTeX?) in a CAS-independent way -- that's quite a task.   In
particular, it would seem to require an understanding of what assume()
does in Maxima, and what is() does also.

x and not x   has no particular meaning,  but if x is explicitly true or false,
Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
semantics, which you presumably know about and can modify.

As far as recording stuff in DLMF -- there are presumably scope issues
("in this chapter n,m are natural numbers....")  and maybe even a need
to make value assignments. 
I think you need to model these in SEALATEX too.

Just musing about where you are heading.

RJF




On 8/18/2016 11:45 AM, Tim Daly wrote:
Fateman [0] raised a set of issues with the OpenMath
approach. We are not trying to be cross-platform in this
effort. Axiom does provide an algebraic scaffold so it is
possible that the selatex markup might be useful elsewhere
but that is not a design criterion.

Fateman[1] also raises some difficult cross-platform issues
that are not part of this design.

Fateman[2] shows that parsing tex with only syntactic markup
succeeded on only 43% of 10740 inputs. It ought to be posible
to increase this percentage given proper semantic markup.
(Perhaps there should be a competition similar to the deep
learning groups? PhDs have been awarded on incremental
improvements of the percentage)

This is a design-by-crawl approach to the semantic markup
idea. The hope is to get something running this week that
'works' but giving due consideration to global and long-term
issues. A first glance at CRC/NIST raises more questions
than answers as is usual with any research.

It IS a design goal to support a Computer Algebra Test Suite
tedious to hand construct test suites. It will be even more
tedious to construct them "second-level" by doing semantic
markup and then trying to use them as input, but the hope is
that eventually the CRC/NIST/G&R, etc will eventually be
published with semantics so computational mathematics can
stop working from syntax.

===========
Consideration 4: I/O transparency

Assume for the moment that we take a latex file containing
only formulas. We would like to be able to read this file so
it has computational mathematics (CM) semantics.

It is clear that there needs to be semantic tags that carry the
information but these tags have to be carefully designed NOT
to change the syntactic display. They may, as noted before,
require multiple semantic versions for a single syntax.

It is also clear that we would like to be able to output formulas
with CM semantics where currently we only output syntax.

===========
Consideration 5: I/O isomorphism

An important property of selatex is an isomorphism with
input/output. Axiom allows output forms to be defined for a
variety of targets so this does not seem to be a problem. For
input, however, this means that the reader has to know how
to expand \INT{3} into the correct domain. This could be done
with a stand-alone pre-processor from selatex->inputform.

It should be possible to read-then-write an selatex formula,
or write-then-read an selatex formula with identical semantics.

That might not mean that the I/O is identical though due to
things like variable ordering, etc.

===========
Consideration 6: Latex semantic macros

Semantic markup would be greatly simplified if selatex provided
a mechanism similar to Axiom's ability to define types "on the fly"
using either assignment

   TYP:=FRAC(POLY(INT))

or macro form

   TYP ==> FRAC(POLY(INT))

Latex is capable of doing this and selatex should probably include
a set of pre-defined common markups, such as

  \FRINT ==> \FRAC\INT

===========
Consideration 7: selatex \begin{semantic} environment?

Currently Axiom provides a 'chunk' environment which surrounds
source code. The chunks are named so they can be extracted
individually or in groups

   \begin{chunk}{a name for the chunk}
      anything
   \end{chunk}

We could provide a similar environment for semantics such as

  \begin{semantics}{a name for the block}
  \end{semantics}

which would provide a way to encapsulate markup and also allow
a particular block to be extracted in literate programming style.

===========
Consideration 8: Latex-time processing

Axiom currently creates specific files using \write to create
intermediate files (e.g. for tables). This technique can be used
to enhance latex-time debugging (where did it fail?).

It can be used to create Axiom files which pre-construct domains
needed when the input file with semantic markup is read.

This would help a stand-alone selatex->inputform preprocessor.

===========
Consideration 9: Design sketches

It is all well-and-good to hand-wave at this idea but a large
amount of this machinery already exists.

It would seem useful to develop an incremental test suite that
starts with "primitive" domains (e.g. INT), creating selatex I/O.

Once these are in place we could work on "type tower" markup
such as \FRAC\INT or \POLY\COMPLEX\FLOAT.

Following that might be pre-existing latex functions like \int, \sum,
\cos, etc.

To validate these ideas Axiom will include an selatex.sty file and
some unit tests files on primitive domain markup. That should be
enough to start the bikeshed discussions.

Ideas? Considerations? Suggestions?

Tim

[0] Fateman, Richard J.
"A Critique of OpenMath and Thoughts on
Encoding Mathematics, January, 2001"
https://people.eecs.berkeley.edu/~fateman/papers/openmathcrit.pdf

[1] Fateman, Richard J.
"Verbs, Nouns, and Computer Algebra, or What's Grammar Got to
[2] Fateman, Richard J.








_______________________________________________
Axiom-developer mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/axiom-developer
Reply | Threaded
Open this post in threaded view
|

Re: Design Thoughts on Semantic Latex (SELATEX)

Tim Daly
For those of you at home wishing to play along, there is a
selatex.test1 file at
http://axiom-developer.org/axiom-website/selatex.test1
containing 620 integrals.

Each line is a call of the form

  weaver(latex-string,axiom-string)

The goal is to transform the latex into Axiom.

Implicit is the idea that weaver will use the selatex tokens
to disambiguate the input. The current file has no selatex
tokens. They will be added as needed. The idea is to keep
the problem simple by adding print-invisible sematics to the
latex-string. In the ideal case the weaver program is trivial,
as is the markup. Any tradeoff should prioritize simplicity.
Another priority is to align the semantic markup with
Axiom domains in order to ground the semantics with code.

Once all of these calls translate correctly the Axiom output
routines need to output the latex-string with the added
semantic markup so the mapping is bi-directional.

The current file only looks at integration as I already have
the latex->axiom text available. Future test files will look
at other areas of interest. The long term goal is to parse
NIST/CRC/etc formulas.

Tim


On Sun, Aug 21, 2016 at 11:02 PM, Tim Daly <[hidden email]> wrote:
Dan,

While paging through the CRC 31st Standard Mathematical
Tables I landed on page 219, section 3.4.1.2 Graph Invariants.

It would be a vast improvement if there were algorithms
associated with these invariants. Clearly they exist somewhere.

To "cross the gap" between tables and computational mathematics
it would be valuable to include implementations of these invariants.

It is hard to walk away from that page. An Axiom implementation
would be fun to write, especially given the next section that lists
different kinds of graphs which, presumably, would all have the
invariants. Even better, the graph algorithms are likely good
candidates for proof technology (ACL2 if done in Lisp, COQ if
done in Spad). Lisp has the advantage of an ANSI standard.

It seems worthwhile to take sections like this, expand them
across computational and proof tools, and publish them in a
form that is generally useful. It is "nice to know" that a graph
has a radius but it would be even better if I could "just point and
click" to import the algorithm.

Axiom has been pushing literate programming for years. The
tools exist to "make it so", as the saying goes.

Tim




On Sun, Aug 21, 2016 at 10:40 PM, Tim Daly <[hidden email]> wrote:
Like any research problem it is a struggle to get a useful grip on this.
Looking at G&R (I just ordered the latest, mine is 4th edition), the
task quickly gets out of hand.

The CATS tests in the past were created by reading the printed latex
in various volumes and hand-translating them to Axiom input.

It is not difficult to re-create the latex input for these examples.
Doing so and combining the results gives a set of examples with
matching input latex and output Axiom. The homework problem is
to write the necessary markup and weaver.

It is immediately obvious that this is more challenging than it seems.
For example, when writing y'(x)=0, Axiom needs y:=operator 'y
so it knows about the symbol as an operator. This falls under
"Consideration 12: System Specific Commands"... which implies
that the latex environment and quoting macros have to be
implemented. Sigh.

There is no such thing as a simple job.

Anyway, at least there is a way to make a proof of concept
prototype that reproduces existing results.

Tim


On Sun, Aug 21, 2016 at 4:17 PM, Tim Daly <[hidden email]> wrote:
Dan,

Welcome.

Re: Howard Cohl. Yes, I'd like an introduction. It seems important to
make DLMF, CRC, and other sources contain enough semantics that
they can be read by a computer algebra system. There are an
enormous number of issues, such as what to do with functions
unknown to the CAS, which need to be thought through.

I believe that NIST/CRC/G&R collections with semantic markup will
have a great normalizing effect on CAS development since it will raise
cross-platform questions like "What percent of G&R do you handle?".
Albert Rich (RUBI)[0] has been doing this for integration using patterns.
This can only benefit computational mathematics in the long term.

I've also campaigned for associating algorithms with published tables.
It is important in the long term to have reference versions. The ACM
used to do this years ago. I'd like to see a Gruntz algorithm for limits
where it can be applied, for instance. It would also provide a focus on
"missing algorithms" or edge cases. Davenport/Trager/Bronstein
algorithms promise a decision procedure but there are no existing
complete implementations. The tables could highlight missing cases,
giving focus to efforts to complete the procedure.

It will also put back-pressure on the tables to define different versions
of the same formulas based on domains (C, R, etc).


"The GR work was more than I had anticipated"... wins the award for
understatement of the decade.

The goal of this effort is to make it possible to read those
formulas directly into a CAS. Axiom is my primary target but
it should be done in a somewhat system agnostic form.

I've spent well over a year creating the computer algebra test suite.
It would be so much easier and more useful if the original sources
could be read directly.

I read your paper. There is an interesting mix of syntax and semantics.

I guess the difference in this effort is that the semantic markup is
intended to be transparent and grounded. The transparent aspect
keeps the tables unchanged in print form. The grounded aspect keeps
the semantics based on some algorithmic base. This implies, of
course, that there IS an algorithmic base which does not yet exist
for some of the work.

As you can see from the CATS work I've been trying to validate
Axiom's results with respect to published texts. This has found both
Axiom bugs and misprints in published texts.

The Kamke[1] suite was the first effort for differential equations.

The Spiegel[2] chapter 14 on indefinite integrals for integration.

The von Seggern[3] book on curves and surfaces for graphics.

The Legendre and Grazini[4] on Pasta by Design for 3D graphics.

The RUBI work on integration.

and, currently I'm re-creating the numerics that were lost when NAG
released the open source version, leaving me swimming through
Luke's[5] Algorithms book.

which, to quote a famous phrase "was more than I had anticipated".

Your Handbook of Integration[6] has a section on various known
"Caveats, How an integration result may be incorrect". This raises the
wonderful topic of branch cuts yet again. I did some testing and it
seems that Axiom and Mathematica share one set while Maple and
Maxima share another.

All of which leads to a need to create better reference materials that
are generally available (unlike the ACM algorithms for non-paying
customers) and directly useful for computational mathematics.

The current plan is to take some tables, find or re-create the latex,
invent a semantic markup, and then write the "weaver". At this point
the research is still at the "proof of concept" stage. (tex formula
sources are most welcome).

Ultimately I'd really like to see a book of formulas and algorithms
that I can just drag-and-drop into Axiom and be able to use them
without lifetimes of work.

Actually, that 's only the penultimate goal. I have augmented
Axiom to include proofs (ACL2,COQ) so I'd also like to see proofs,
(this IS mathematics, after all) but maybe we'll leave that for
next month :-)

Tim

[0] Rich, Albert "Rule-based Mathematics"
http://www.apmaths.uwo.ca/~arich/

[1] Kamke. E. "Differentialgleichungen Losungsmethoden und Losungen"
Chelsea Publishing Company, New York, 1959

[2] Spiegel, Murray R. "Mathematical Handbook", Schaum's Outline
Series; McGraw-Hill Book Company 1968

[3] von Seggern, David "CRC Standard Curves and Surfaces",
CRC Press, 1993 ISBN 0-8493-0196-3

[4] Legendre, George L. and Grazini, Stefano "Pasta by Design",
Thames and Hudson, 2001

[5] Luke, Yudell "Algorithms for the Computation of Mathematical
Functions", Academic Press, 1977 ISBN 0-12-459940-6

[6] Zwillinger, Daniel "Handbook of Integration" Jones and Bartlett,
London, 1992
 


On Sun, Aug 21, 2016 at 10:16 AM, Dan Zwillinger <[hidden email]> wrote:

All

I began reading this topic's emails when they first appeared, and then fell behind.
Sorry for my late comments.

I admire your efforts.
They may be somewhat related to what I have done in the past.
My experience is as follows:

(1) CRC SMTF (Standard Mathematical Tables and Formula)

I put the ~700 integrals in CRC's SMTF into a format from which
(A) they could be typeset in LaTeX
(B) they could be converted into Mathematica (which either did a symbolic or numeric computation) - and this was done

I let Richard Fateman use them for his experiments.


(2) Elsevier's GR (Gradshteyn and Ryzhik's "Table of Integrals, Series, and Products")

I put the ~12,000 (if my memory is correct) integrals into a format from which
(A) they could be beautifully typeset in LaTeX
(B) they could be converted into Mathematica - and this was NOT done
Enclosed is a PDF file describing the work and the resulting format.


A different format was used for SMTF and GR.
While the SMTF work was not too arduous, the GR work was more than I had anticipated.
The input (the previous version of GR) had little syntactic structure and it took much effort to get it into shape.
I used (many different) regular expressions (in perl) to translate the bulk of the book, and then lots of hand tuning.

While I think you are well beyond my thinking on these topics, please let me know if I can help.
I am friends with Howard Cohl at NIST, who may be the current lead for DLMF ("Digital Library of Mathematical Functions" at dlmf.nist.gov).
Let me know if you need an introduction.


Dan Zwillinger
[hidden email]
<a href="tel:617-388-2382" value="+16173882382" target="_blank">617-388-2382
On 8/20/2016 11:30 PM, Tim Daly wrote:
The game is to define latex markup that is transparent to the syntax
but adds semantics for post processing.

Some simple tests show that this works. Suppose selatex.sty contains:

\newcommand{\INT}[1]{#1}
\newcommand{\VARIABLE}[1]{#1}
\newcommand{\POLY}[1]{#1}
\newcommand{\INTEG}[2]{\int{#1}}

This defines 4 new latex markups. The number in the square brackets
defines the number of expected arguments. The brace argument
delimites the characters that will occur during expansion with the #1
replaced by the first argument.

(As an aside, INT, VARIABLE, and POLY just happen to be valid
Axiom domain abbreviations, hence the name choice. This choice
of names gives grounding to the semantics.)

Notice that \INTEG takes two arguments but will display only one.
This allows the variable of integration to be passed in the semantics
without showing up in the output. This allows the semantics to carry
additional, non-display information needed by the CAS.

Some examples follow.

An integer 3 can be wrapped as \INT{3} but will still display as 3.

A variable x can be wrapped as \VARIABLE{x}, displayed as x.

$\POLY{\INT{3}\VARIABLE{x}}$ will display as 3*x

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x} will be the same result
as $\int{3x~dx}$, that is, an
  (integralsign) 3x dx
but notice that the variable of integration is in the semantic markup.

These trivial macros can be made less verbose and certainly
more clever but that's not the point being made here.

A 'weaver' program would see the integration expression as

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x}$

with all of the semantic tags. The weaver's job is to rewrite this
expression into an inputform for the CAS. In Axiom that would be

integrate(3*x,x)

The semantics markup makes the display pretty and the semantic
parsing possible. Depending on the system, more or less parsing
markup can exist. Axiom, for example, would not need the \INT or
\VARIABLE to get a correct parse so the expression could be
$\INTEG{3x~dx}{x}$

This validates the fundamental idea.

The next step is to write a simple weaver program. The clever path
would be to embed a declarative form of the parser syntax (BNF?)
as comments in selatex.sty. That way the latex semantics and the
weaver syntax are kept in sync.

Weaver would read the BNF comments from selatex.sty and
the formula with semantic markup as input and parse the semantic
markup into inputforms. (Wish I thought of this homework problem
when I taught the compiler course :-) ).

Note that, depending on the BNF, weaver could be used to
generate output for Maxima's tree-based representation.

An alternative next step is to look at a CRC book, re-create the
syntactic latex and then create the selatex.sty entries necessary
to generate weaver input.

Infinitesimal progress, but progress non-the-less.

Tim


On Fri, Aug 19, 2016 at 12:45 AM, Tim Daly <[hidden email]> wrote:

One of the Axiom project goals is to develop a "Computer Algebra Test
Suite" (CATS). Albert Rich has done this with RUBI and integration. That
work is already partially in the test suite and work has been done on the
pattern matching. Large datasets (like Kamke) are always welcome. Some,
such as Schaums were hand-developed. This is tedious.

As Axiom develops more explanations and documentation it would be
useful to execute the formulas directly so there is a local incentive to be
clear about semantics.

In the long term the hope is that we can just grab formulas directly from
their sources (ala literate programming). Your work makes it plain that raw
latex does not carry sufficient semantics. There is a global question of
how to make this work. Unfortunately a general cross-platform solution
is difficult (cite Dewar/Davenport/et al. for OpenMath).

Since Axiom is literate and extracting formulas is trivial it seems that
literate markup is a natural goal. Since Axiom uses abstract algebra
as a scaffold the type tower already has a lot of axiomatic semantics.
The natural join of literate latex and abstract algebra is clearly
semantic markup, aka selatex.

===========
Consideration 10: semantic->inputform translation (weaver? :-) )

>x and not x   has no particular meaning,  but if x is explicitly true or false,
>Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
>defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
>semantics, which you presumably know about and can modify.

I am NOT defining another CAS. The goal is a "well-designed hack" using
universally understood latex, a latex package, and a translation program.

The selatex idea is only partially Axiom specific. \INT, for instance, seems
pretty generic. However, if the idea is to read formulas and disambiguate
a=b (boolean) vs a=b (equation) then the markup needs to be grounded
to have meaning. Axiom's domains (BOOLEAN) and (EQ) as the ground

\BOOLEAN(a=b)
\EQ(a=b)

are unambiguous relative to each other in Axiom. I don't know enough
about Maxima to understand how this might translate.

The extracted formulas with the decorated semantics still needs a
semantics->inputform (weaver) pre-processor which could be Maxima
specific. This would lead to debate about what "equality" means, of course.

Axiom has tried to create a first-order "rosetta stone" to translate between
systems (rosetta.pdf [1]) but it is too shallow to consider providing
cross-platform semantics.

=============
Consideration 11: \scope in selatex

>As far as recording stuff in DLMF -- there are presumably scope issues
>("in this chapter n,m are natural numbers....")  and maybe even a need
>to make value assignments. 
>I think you need to model these in SEALATEX too.

(See Consideration 6)

Clearly there are scoping issues. My current thinking is to create a
\scope markup that would manage the environment(s). This is not
a new issue (see "Lisp in Small Pieces" [0])

There seem to be three concerns.

First is the scope name, with something like 'global' as a keyword.
Second is the "closure chain" of other scopes.
Third is the symbol being scoped.

\scope{name}{chain}{symbol}

The weaver program would walk this chain to create the proper
file syntax for system input.

============
Consideration 12: System specific commands \axiom

Along with the formulas it is clear that some system specific
input may be required, such as loading files, clearing workspaces,
etc. Some of these may be done in the weaver program, such as
between formulas. Others may need to be added to the semantics
block. So a markup that provides verbatim quoting per system
might be defined, e.g.

\axiom{)clear all}  %clear the workspace

which would simply quote an input line.

==============

Note that so far all that is being suggested is transparent formula
markups which do not impact the presentation, some special tags
(\scope, \axiom,...) and a weaver program, along with the ability to
read the latex and extract named formulas (aka a literate program,
which Axiom already can do).

It ought to be possible (by design) to create a semantic version of
CRC that any system could import, assuming a "sufficiently clever
weaver".

On a more ambitious note, I am trying to find a way to keep the selatex
markup "hidden" in a pdf and use it as the clipboard paste when the
formula is selected. Anyone with a clue, please help.

===============

[0] Queinnec, Christopher, "Lisp in Small Pieces" ISBN 978-0521545662
(2003)

[1] Wester, Michael J. and Daly, TImothy "Rosetta"



On Thu, Aug 18, 2016 at 5:30 PM, Richard Fateman <[hidden email]> wrote:
thanks for all the references :)

I'm not sure if I'm going to repeat comments I  made already somewhere.
First,  has Dan Zwillinger weighed in?  I think that it would be useful
to see what he has done.

Next, there are ambiguities among CAS and even within a single CAS.

For example, in Macsyma/ Maxima  there is generally no semantics
associated with "=" or ">".   But in some contexts, there is some meaning.

x>2*y

is a tree expression.  It is not associated with x or with y.

assume(x>2*y)   does mean something ... it puts info in a database.
Somehow encoding the method to extract this information into SEALATEX
(SeLaTeX?) in a CAS-independent way -- that's quite a task.   In
particular, it would seem to require an understanding of what assume()
does in Maxima, and what is() does also.

x and not x   has no particular meaning,  but if x is explicitly true or false,
Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
semantics, which you presumably know about and can modify.

As far as recording stuff in DLMF -- there are presumably scope issues
("in this chapter n,m are natural numbers....")  and maybe even a need
to make value assignments. 
I think you need to model these in SEALATEX too.

Just musing about where you are heading.

RJF




On 8/18/2016 11:45 AM, Tim Daly wrote:
Fateman [0] raised a set of issues with the OpenMath
approach. We are not trying to be cross-platform in this
effort. Axiom does provide an algebraic scaffold so it is
possible that the selatex markup might be useful elsewhere
but that is not a design criterion.

Fateman[1] also raises some difficult cross-platform issues
that are not part of this design.

Fateman[2] shows that parsing tex with only syntactic markup
succeeded on only 43% of 10740 inputs. It ought to be posible
to increase this percentage given proper semantic markup.
(Perhaps there should be a competition similar to the deep
learning groups? PhDs have been awarded on incremental
improvements of the percentage)

This is a design-by-crawl approach to the semantic markup
idea. The hope is to get something running this week that
'works' but giving due consideration to global and long-term
issues. A first glance at CRC/NIST raises more questions
than answers as is usual with any research.

It IS a design goal to support a Computer Algebra Test Suite
tedious to hand construct test suites. It will be even more
tedious to construct them "second-level" by doing semantic
markup and then trying to use them as input, but the hope is
that eventually the CRC/NIST/G&R, etc will eventually be
published with semantics so computational mathematics can
stop working from syntax.

===========
Consideration 4: I/O transparency

Assume for the moment that we take a latex file containing
only formulas. We would like to be able to read this file so
it has computational mathematics (CM) semantics.

It is clear that there needs to be semantic tags that carry the
information but these tags have to be carefully designed NOT
to change the syntactic display. They may, as noted before,
require multiple semantic versions for a single syntax.

It is also clear that we would like to be able to output formulas
with CM semantics where currently we only output syntax.

===========
Consideration 5: I/O isomorphism

An important property of selatex is an isomorphism with
input/output. Axiom allows output forms to be defined for a
variety of targets so this does not seem to be a problem. For
input, however, this means that the reader has to know how
to expand \INT{3} into the correct domain. This could be done
with a stand-alone pre-processor from selatex->inputform.

It should be possible to read-then-write an selatex formula,
or write-then-read an selatex formula with identical semantics.

That might not mean that the I/O is identical though due to
things like variable ordering, etc.

===========
Consideration 6: Latex semantic macros

Semantic markup would be greatly simplified if selatex provided
a mechanism similar to Axiom's ability to define types "on the fly"
using either assignment

   TYP:=FRAC(POLY(INT))

or macro form

   TYP ==> FRAC(POLY(INT))

Latex is capable of doing this and selatex should probably include
a set of pre-defined common markups, such as

  \FRINT ==> \FRAC\INT

===========
Consideration 7: selatex \begin{semantic} environment?

Currently Axiom provides a 'chunk' environment which surrounds
source code. The chunks are named so they can be extracted
individually or in groups

   \begin{chunk}{a name for the chunk}
      anything
   \end{chunk}

We could provide a similar environment for semantics such as

  \begin{semantics}{a name for the block}
  \end{semantics}

which would provide a way to encapsulate markup and also allow
a particular block to be extracted in literate programming style.

===========
Consideration 8: Latex-time processing

Axiom currently creates specific files using \write to create
intermediate files (e.g. for tables). This technique can be used
to enhance latex-time debugging (where did it fail?).

It can be used to create Axiom files which pre-construct domains
needed when the input file with semantic markup is read.

This would help a stand-alone selatex->inputform preprocessor.

===========
Consideration 9: Design sketches

It is all well-and-good to hand-wave at this idea but a large
amount of this machinery already exists.

It would seem useful to develop an incremental test suite that
starts with "primitive" domains (e.g. INT), creating selatex I/O.

Once these are in place we could work on "type tower" markup
such as \FRAC\INT or \POLY\COMPLEX\FLOAT.

Following that might be pre-existing latex functions like \int, \sum,
\cos, etc.

To validate these ideas Axiom will include an selatex.sty file and
some unit tests files on primitive domain markup. That should be
enough to start the bikeshed discussions.

Ideas? Considerations? Suggestions?

Tim

[0] Fateman, Richard J.
"A Critique of OpenMath and Thoughts on
Encoding Mathematics, January, 2001"
https://people.eecs.berkeley.edu/~fateman/papers/openmathcrit.pdf

[1] Fateman, Richard J.
"Verbs, Nouns, and Computer Algebra, or What's Grammar Got to
[2] Fateman, Richard J.









_______________________________________________
Axiom-developer mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/axiom-developer
Reply | Threaded
Open this post in threaded view
|

Re: Design Thoughts on Semantic Latex (SELATEX)

Tim Daly
My initial approach was too heavy-handed and Axiom specific.

It seems the semantic markup task can be viewed as an editor
task. Editors don't care about semantics, they just work on text.
Viewed this way the markup's only function is decoration for
post-processing. It is (mostly) system independent.

The edit tasks seem to be {delete, insert, replace} and some
post-markup hints {function, type}

The delete markup tells weaver to remove the latex completely.
This is useful for things like {dx}

The insert markup adds missing semantic text, such as {*} which
is needed to indicate multiplication.

The replace markup gives alternate text for weaver, for things where
the function name might differ, e.g. \int -> integrate

The function markup names a function that weaver should call
which allows special handling in post-processing. It can be any
s-expression (the weaver implementation is currently lisp-based).

The type markup passes type information to weaver so Axiom
knows the target type, useful for things like matrix.

The macros are

\newcommand{\AD}[1]{#1}% delete
\newcommand{\AI}[1]{}% insert
\newcommand{\AR}[2]{#1}% replace

\newcommand{\AF}[2]{#1}% function
\newcommand{\AT}[2]{#1}% type

Note that \AI outputs nothing, so 3\AI{*}x == 3x to latex.

\AD tells weaver to delete the text, e.g. {\AD ~dx} == ~dx to latex

\AR tells weaver to replace the text e.g. {\AR \pi}{\%pi}

\AF tells weaver to call a function, e.g. one that knows how to
rewrite the input in a special way, or does tracing, etc.

\AT adds target type information for Axiom
e.g. \AT{3x+6}{POLY(INT)} == 3x+6 to latex but passes it as
a Polynomial(Integer) to Axiom

For example,

$\int{\frac{dx}{ax*b}}$

becomes

$\AT{\AR{\int}{integrate}{\frac{\AD{dx}}{a\AI{*)x+b}}}{EXPR}{INT}}$

telling weaver that the target type (AT) is EXPR(INT),
the \int is really integrate
the dx is to be ignored and
the ax+b should read a*x+b

There is an obvious tradeoff of markup vs weaver.
For example. \int might be known to weaver.
Or expressions might call an equation rewriter to add {*}

The markup could vary from almost nothing to massive detail
depending on the downstream cleverness.

This initial markup set seems sufficient to handle every task
that requires semantics markup so far. The overhead seems
small and the gain seems large.

Now the only problem is post-processing the latex. Sigh.

There is no such thing as a simple job.

Tim


On Tue, Aug 23, 2016 at 7:27 PM, Tim Daly <[hidden email]> wrote:
For those of you at home wishing to play along, there is a
selatex.test1 file at
http://axiom-developer.org/axiom-website/selatex.test1
containing 620 integrals.

Each line is a call of the form

  weaver(latex-string,axiom-string)

The goal is to transform the latex into Axiom.

Implicit is the idea that weaver will use the selatex tokens
to disambiguate the input. The current file has no selatex
tokens. They will be added as needed. The idea is to keep
the problem simple by adding print-invisible sematics to the
latex-string. In the ideal case the weaver program is trivial,
as is the markup. Any tradeoff should prioritize simplicity.
Another priority is to align the semantic markup with
Axiom domains in order to ground the semantics with code.

Once all of these calls translate correctly the Axiom output
routines need to output the latex-string with the added
semantic markup so the mapping is bi-directional.

The current file only looks at integration as I already have
the latex->axiom text available. Future test files will look
at other areas of interest. The long term goal is to parse
NIST/CRC/etc formulas.

Tim


On Sun, Aug 21, 2016 at 11:02 PM, Tim Daly <[hidden email]> wrote:
Dan,

While paging through the CRC 31st Standard Mathematical
Tables I landed on page 219, section 3.4.1.2 Graph Invariants.

It would be a vast improvement if there were algorithms
associated with these invariants. Clearly they exist somewhere.

To "cross the gap" between tables and computational mathematics
it would be valuable to include implementations of these invariants.

It is hard to walk away from that page. An Axiom implementation
would be fun to write, especially given the next section that lists
different kinds of graphs which, presumably, would all have the
invariants. Even better, the graph algorithms are likely good
candidates for proof technology (ACL2 if done in Lisp, COQ if
done in Spad). Lisp has the advantage of an ANSI standard.

It seems worthwhile to take sections like this, expand them
across computational and proof tools, and publish them in a
form that is generally useful. It is "nice to know" that a graph
has a radius but it would be even better if I could "just point and
click" to import the algorithm.

Axiom has been pushing literate programming for years. The
tools exist to "make it so", as the saying goes.

Tim




On Sun, Aug 21, 2016 at 10:40 PM, Tim Daly <[hidden email]> wrote:
Like any research problem it is a struggle to get a useful grip on this.
Looking at G&R (I just ordered the latest, mine is 4th edition), the
task quickly gets out of hand.

The CATS tests in the past were created by reading the printed latex
in various volumes and hand-translating them to Axiom input.

It is not difficult to re-create the latex input for these examples.
Doing so and combining the results gives a set of examples with
matching input latex and output Axiom. The homework problem is
to write the necessary markup and weaver.

It is immediately obvious that this is more challenging than it seems.
For example, when writing y'(x)=0, Axiom needs y:=operator 'y
so it knows about the symbol as an operator. This falls under
"Consideration 12: System Specific Commands"... which implies
that the latex environment and quoting macros have to be
implemented. Sigh.

There is no such thing as a simple job.

Anyway, at least there is a way to make a proof of concept
prototype that reproduces existing results.

Tim


On Sun, Aug 21, 2016 at 4:17 PM, Tim Daly <[hidden email]> wrote:
Dan,

Welcome.

Re: Howard Cohl. Yes, I'd like an introduction. It seems important to
make DLMF, CRC, and other sources contain enough semantics that
they can be read by a computer algebra system. There are an
enormous number of issues, such as what to do with functions
unknown to the CAS, which need to be thought through.

I believe that NIST/CRC/G&R collections with semantic markup will
have a great normalizing effect on CAS development since it will raise
cross-platform questions like "What percent of G&R do you handle?".
Albert Rich (RUBI)[0] has been doing this for integration using patterns.
This can only benefit computational mathematics in the long term.

I've also campaigned for associating algorithms with published tables.
It is important in the long term to have reference versions. The ACM
used to do this years ago. I'd like to see a Gruntz algorithm for limits
where it can be applied, for instance. It would also provide a focus on
"missing algorithms" or edge cases. Davenport/Trager/Bronstein
algorithms promise a decision procedure but there are no existing
complete implementations. The tables could highlight missing cases,
giving focus to efforts to complete the procedure.

It will also put back-pressure on the tables to define different versions
of the same formulas based on domains (C, R, etc).


"The GR work was more than I had anticipated"... wins the award for
understatement of the decade.

The goal of this effort is to make it possible to read those
formulas directly into a CAS. Axiom is my primary target but
it should be done in a somewhat system agnostic form.

I've spent well over a year creating the computer algebra test suite.
It would be so much easier and more useful if the original sources
could be read directly.

I read your paper. There is an interesting mix of syntax and semantics.

I guess the difference in this effort is that the semantic markup is
intended to be transparent and grounded. The transparent aspect
keeps the tables unchanged in print form. The grounded aspect keeps
the semantics based on some algorithmic base. This implies, of
course, that there IS an algorithmic base which does not yet exist
for some of the work.

As you can see from the CATS work I've been trying to validate
Axiom's results with respect to published texts. This has found both
Axiom bugs and misprints in published texts.

The Kamke[1] suite was the first effort for differential equations.

The Spiegel[2] chapter 14 on indefinite integrals for integration.

The von Seggern[3] book on curves and surfaces for graphics.

The Legendre and Grazini[4] on Pasta by Design for 3D graphics.

The RUBI work on integration.

and, currently I'm re-creating the numerics that were lost when NAG
released the open source version, leaving me swimming through
Luke's[5] Algorithms book.

which, to quote a famous phrase "was more than I had anticipated".

Your Handbook of Integration[6] has a section on various known
"Caveats, How an integration result may be incorrect". This raises the
wonderful topic of branch cuts yet again. I did some testing and it
seems that Axiom and Mathematica share one set while Maple and
Maxima share another.

All of which leads to a need to create better reference materials that
are generally available (unlike the ACM algorithms for non-paying
customers) and directly useful for computational mathematics.

The current plan is to take some tables, find or re-create the latex,
invent a semantic markup, and then write the "weaver". At this point
the research is still at the "proof of concept" stage. (tex formula
sources are most welcome).

Ultimately I'd really like to see a book of formulas and algorithms
that I can just drag-and-drop into Axiom and be able to use them
without lifetimes of work.

Actually, that 's only the penultimate goal. I have augmented
Axiom to include proofs (ACL2,COQ) so I'd also like to see proofs,
(this IS mathematics, after all) but maybe we'll leave that for
next month :-)

Tim

[0] Rich, Albert "Rule-based Mathematics"
http://www.apmaths.uwo.ca/~arich/

[1] Kamke. E. "Differentialgleichungen Losungsmethoden und Losungen"
Chelsea Publishing Company, New York, 1959

[2] Spiegel, Murray R. "Mathematical Handbook", Schaum's Outline
Series; McGraw-Hill Book Company 1968

[3] von Seggern, David "CRC Standard Curves and Surfaces",
CRC Press, 1993 ISBN 0-8493-0196-3

[4] Legendre, George L. and Grazini, Stefano "Pasta by Design",
Thames and Hudson, 2001

[5] Luke, Yudell "Algorithms for the Computation of Mathematical
Functions", Academic Press, 1977 ISBN 0-12-459940-6

[6] Zwillinger, Daniel "Handbook of Integration" Jones and Bartlett,
London, 1992
 


On Sun, Aug 21, 2016 at 10:16 AM, Dan Zwillinger <[hidden email]> wrote:

All

I began reading this topic's emails when they first appeared, and then fell behind.
Sorry for my late comments.

I admire your efforts.
They may be somewhat related to what I have done in the past.
My experience is as follows:

(1) CRC SMTF (Standard Mathematical Tables and Formula)

I put the ~700 integrals in CRC's SMTF into a format from which
(A) they could be typeset in LaTeX
(B) they could be converted into Mathematica (which either did a symbolic or numeric computation) - and this was done

I let Richard Fateman use them for his experiments.


(2) Elsevier's GR (Gradshteyn and Ryzhik's "Table of Integrals, Series, and Products")

I put the ~12,000 (if my memory is correct) integrals into a format from which
(A) they could be beautifully typeset in LaTeX
(B) they could be converted into Mathematica - and this was NOT done
Enclosed is a PDF file describing the work and the resulting format.


A different format was used for SMTF and GR.
While the SMTF work was not too arduous, the GR work was more than I had anticipated.
The input (the previous version of GR) had little syntactic structure and it took much effort to get it into shape.
I used (many different) regular expressions (in perl) to translate the bulk of the book, and then lots of hand tuning.

While I think you are well beyond my thinking on these topics, please let me know if I can help.
I am friends with Howard Cohl at NIST, who may be the current lead for DLMF ("Digital Library of Mathematical Functions" at dlmf.nist.gov).
Let me know if you need an introduction.


Dan Zwillinger
[hidden email]
<a href="tel:617-388-2382" value="+16173882382" target="_blank">617-388-2382
On 8/20/2016 11:30 PM, Tim Daly wrote:
The game is to define latex markup that is transparent to the syntax
but adds semantics for post processing.

Some simple tests show that this works. Suppose selatex.sty contains:

\newcommand{\INT}[1]{#1}
\newcommand{\VARIABLE}[1]{#1}
\newcommand{\POLY}[1]{#1}
\newcommand{\INTEG}[2]{\int{#1}}

This defines 4 new latex markups. The number in the square brackets
defines the number of expected arguments. The brace argument
delimites the characters that will occur during expansion with the #1
replaced by the first argument.

(As an aside, INT, VARIABLE, and POLY just happen to be valid
Axiom domain abbreviations, hence the name choice. This choice
of names gives grounding to the semantics.)

Notice that \INTEG takes two arguments but will display only one.
This allows the variable of integration to be passed in the semantics
without showing up in the output. This allows the semantics to carry
additional, non-display information needed by the CAS.

Some examples follow.

An integer 3 can be wrapped as \INT{3} but will still display as 3.

A variable x can be wrapped as \VARIABLE{x}, displayed as x.

$\POLY{\INT{3}\VARIABLE{x}}$ will display as 3*x

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x} will be the same result
as $\int{3x~dx}$, that is, an
  (integralsign) 3x dx
but notice that the variable of integration is in the semantic markup.

These trivial macros can be made less verbose and certainly
more clever but that's not the point being made here.

A 'weaver' program would see the integration expression as

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x}$

with all of the semantic tags. The weaver's job is to rewrite this
expression into an inputform for the CAS. In Axiom that would be

integrate(3*x,x)

The semantics markup makes the display pretty and the semantic
parsing possible. Depending on the system, more or less parsing
markup can exist. Axiom, for example, would not need the \INT or
\VARIABLE to get a correct parse so the expression could be
$\INTEG{3x~dx}{x}$

This validates the fundamental idea.

The next step is to write a simple weaver program. The clever path
would be to embed a declarative form of the parser syntax (BNF?)
as comments in selatex.sty. That way the latex semantics and the
weaver syntax are kept in sync.

Weaver would read the BNF comments from selatex.sty and
the formula with semantic markup as input and parse the semantic
markup into inputforms. (Wish I thought of this homework problem
when I taught the compiler course :-) ).

Note that, depending on the BNF, weaver could be used to
generate output for Maxima's tree-based representation.

An alternative next step is to look at a CRC book, re-create the
syntactic latex and then create the selatex.sty entries necessary
to generate weaver input.

Infinitesimal progress, but progress non-the-less.

Tim


On Fri, Aug 19, 2016 at 12:45 AM, Tim Daly <[hidden email]> wrote:

One of the Axiom project goals is to develop a "Computer Algebra Test
Suite" (CATS). Albert Rich has done this with RUBI and integration. That
work is already partially in the test suite and work has been done on the
pattern matching. Large datasets (like Kamke) are always welcome. Some,
such as Schaums were hand-developed. This is tedious.

As Axiom develops more explanations and documentation it would be
useful to execute the formulas directly so there is a local incentive to be
clear about semantics.

In the long term the hope is that we can just grab formulas directly from
their sources (ala literate programming). Your work makes it plain that raw
latex does not carry sufficient semantics. There is a global question of
how to make this work. Unfortunately a general cross-platform solution
is difficult (cite Dewar/Davenport/et al. for OpenMath).

Since Axiom is literate and extracting formulas is trivial it seems that
literate markup is a natural goal. Since Axiom uses abstract algebra
as a scaffold the type tower already has a lot of axiomatic semantics.
The natural join of literate latex and abstract algebra is clearly
semantic markup, aka selatex.

===========
Consideration 10: semantic->inputform translation (weaver? :-) )

>x and not x   has no particular meaning,  but if x is explicitly true or false,
>Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
>defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
>semantics, which you presumably know about and can modify.

I am NOT defining another CAS. The goal is a "well-designed hack" using
universally understood latex, a latex package, and a translation program.

The selatex idea is only partially Axiom specific. \INT, for instance, seems
pretty generic. However, if the idea is to read formulas and disambiguate
a=b (boolean) vs a=b (equation) then the markup needs to be grounded
to have meaning. Axiom's domains (BOOLEAN) and (EQ) as the ground

\BOOLEAN(a=b)
\EQ(a=b)

are unambiguous relative to each other in Axiom. I don't know enough
about Maxima to understand how this might translate.

The extracted formulas with the decorated semantics still needs a
semantics->inputform (weaver) pre-processor which could be Maxima
specific. This would lead to debate about what "equality" means, of course.

Axiom has tried to create a first-order "rosetta stone" to translate between
systems (rosetta.pdf [1]) but it is too shallow to consider providing
cross-platform semantics.

=============
Consideration 11: \scope in selatex

>As far as recording stuff in DLMF -- there are presumably scope issues
>("in this chapter n,m are natural numbers....")  and maybe even a need
>to make value assignments. 
>I think you need to model these in SEALATEX too.

(See Consideration 6)

Clearly there are scoping issues. My current thinking is to create a
\scope markup that would manage the environment(s). This is not
a new issue (see "Lisp in Small Pieces" [0])

There seem to be three concerns.

First is the scope name, with something like 'global' as a keyword.
Second is the "closure chain" of other scopes.
Third is the symbol being scoped.

\scope{name}{chain}{symbol}

The weaver program would walk this chain to create the proper
file syntax for system input.

============
Consideration 12: System specific commands \axiom

Along with the formulas it is clear that some system specific
input may be required, such as loading files, clearing workspaces,
etc. Some of these may be done in the weaver program, such as
between formulas. Others may need to be added to the semantics
block. So a markup that provides verbatim quoting per system
might be defined, e.g.

\axiom{)clear all}  %clear the workspace

which would simply quote an input line.

==============

Note that so far all that is being suggested is transparent formula
markups which do not impact the presentation, some special tags
(\scope, \axiom,...) and a weaver program, along with the ability to
read the latex and extract named formulas (aka a literate program,
which Axiom already can do).

It ought to be possible (by design) to create a semantic version of
CRC that any system could import, assuming a "sufficiently clever
weaver".

On a more ambitious note, I am trying to find a way to keep the selatex
markup "hidden" in a pdf and use it as the clipboard paste when the
formula is selected. Anyone with a clue, please help.

===============

[0] Queinnec, Christopher, "Lisp in Small Pieces" ISBN 978-0521545662
(2003)

[1] Wester, Michael J. and Daly, TImothy "Rosetta"



On Thu, Aug 18, 2016 at 5:30 PM, Richard Fateman <[hidden email]> wrote:
thanks for all the references :)

I'm not sure if I'm going to repeat comments I  made already somewhere.
First,  has Dan Zwillinger weighed in?  I think that it would be useful
to see what he has done.

Next, there are ambiguities among CAS and even within a single CAS.

For example, in Macsyma/ Maxima  there is generally no semantics
associated with "=" or ">".   But in some contexts, there is some meaning.

x>2*y

is a tree expression.  It is not associated with x or with y.

assume(x>2*y)   does mean something ... it puts info in a database.
Somehow encoding the method to extract this information into SEALATEX
(SeLaTeX?) in a CAS-independent way -- that's quite a task.   In
particular, it would seem to require an understanding of what assume()
does in Maxima, and what is() does also.

x and not x   has no particular meaning,  but if x is explicitly true or false,
Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
semantics, which you presumably know about and can modify.

As far as recording stuff in DLMF -- there are presumably scope issues
("in this chapter n,m are natural numbers....")  and maybe even a need
to make value assignments. 
I think you need to model these in SEALATEX too.

Just musing about where you are heading.

RJF




On 8/18/2016 11:45 AM, Tim Daly wrote:
Fateman [0] raised a set of issues with the OpenMath
approach. We are not trying to be cross-platform in this
effort. Axiom does provide an algebraic scaffold so it is
possible that the selatex markup might be useful elsewhere
but that is not a design criterion.

Fateman[1] also raises some difficult cross-platform issues
that are not part of this design.

Fateman[2] shows that parsing tex with only syntactic markup
succeeded on only 43% of 10740 inputs. It ought to be posible
to increase this percentage given proper semantic markup.
(Perhaps there should be a competition similar to the deep
learning groups? PhDs have been awarded on incremental
improvements of the percentage)

This is a design-by-crawl approach to the semantic markup
idea. The hope is to get something running this week that
'works' but giving due consideration to global and long-term
issues. A first glance at CRC/NIST raises more questions
than answers as is usual with any research.

It IS a design goal to support a Computer Algebra Test Suite
tedious to hand construct test suites. It will be even more
tedious to construct them "second-level" by doing semantic
markup and then trying to use them as input, but the hope is
that eventually the CRC/NIST/G&R, etc will eventually be
published with semantics so computational mathematics can
stop working from syntax.

===========
Consideration 4: I/O transparency

Assume for the moment that we take a latex file containing
only formulas. We would like to be able to read this file so
it has computational mathematics (CM) semantics.

It is clear that there needs to be semantic tags that carry the
information but these tags have to be carefully designed NOT
to change the syntactic display. They may, as noted before,
require multiple semantic versions for a single syntax.

It is also clear that we would like to be able to output formulas
with CM semantics where currently we only output syntax.

===========
Consideration 5: I/O isomorphism

An important property of selatex is an isomorphism with
input/output. Axiom allows output forms to be defined for a
variety of targets so this does not seem to be a problem. For
input, however, this means that the reader has to know how
to expand \INT{3} into the correct domain. This could be done
with a stand-alone pre-processor from selatex->inputform.

It should be possible to read-then-write an selatex formula,
or write-then-read an selatex formula with identical semantics.

That might not mean that the I/O is identical though due to
things like variable ordering, etc.

===========
Consideration 6: Latex semantic macros

Semantic markup would be greatly simplified if selatex provided
a mechanism similar to Axiom's ability to define types "on the fly"
using either assignment

   TYP:=FRAC(POLY(INT))

or macro form

   TYP ==> FRAC(POLY(INT))

Latex is capable of doing this and selatex should probably include
a set of pre-defined common markups, such as

  \FRINT ==> \FRAC\INT

===========
Consideration 7: selatex \begin{semantic} environment?

Currently Axiom provides a 'chunk' environment which surrounds
source code. The chunks are named so they can be extracted
individually or in groups

   \begin{chunk}{a name for the chunk}
      anything
   \end{chunk}

We could provide a similar environment for semantics such as

  \begin{semantics}{a name for the block}
  \end{semantics}

which would provide a way to encapsulate markup and also allow
a particular block to be extracted in literate programming style.

===========
Consideration 8: Latex-time processing

Axiom currently creates specific files using \write to create
intermediate files (e.g. for tables). This technique can be used
to enhance latex-time debugging (where did it fail?).

It can be used to create Axiom files which pre-construct domains
needed when the input file with semantic markup is read.

This would help a stand-alone selatex->inputform preprocessor.

===========
Consideration 9: Design sketches

It is all well-and-good to hand-wave at this idea but a large
amount of this machinery already exists.

It would seem useful to develop an incremental test suite that
starts with "primitive" domains (e.g. INT), creating selatex I/O.

Once these are in place we could work on "type tower" markup
such as \FRAC\INT or \POLY\COMPLEX\FLOAT.

Following that might be pre-existing latex functions like \int, \sum,
\cos, etc.

To validate these ideas Axiom will include an selatex.sty file and
some unit tests files on primitive domain markup. That should be
enough to start the bikeshed discussions.

Ideas? Considerations? Suggestions?

Tim

[0] Fateman, Richard J.
"A Critique of OpenMath and Thoughts on
Encoding Mathematics, January, 2001"
https://people.eecs.berkeley.edu/~fateman/papers/openmathcrit.pdf

[1] Fateman, Richard J.
"Verbs, Nouns, and Computer Algebra, or What's Grammar Got to
[2] Fateman, Richard J.










_______________________________________________
Axiom-developer mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/axiom-developer
Reply | Threaded
Open this post in threaded view
|

Re: Design Thoughts on Semantic Latex (SELATEX)

William Sit-3

Hi Tim:


Would it be simpler to only add semantic markups to algorithmic descriptions in papers? Authors can be asked to provide a separate chunk with [Axiom] semantic markups (in essence, a skeleton implementation or pseudo-code of the algorithm involved---skeleton because the data structures of mathematical objects are usually ignored in a math paper).  This would avoid having to mess with the latex source (already hard to read sometimes) or to "weave" to remove the semantic markups to recapture the latex: all that is needed would be to ignore the semantic chunk). Put another way, the semantic chunk is a direct (by hand or automatic) translation of the latex version of an algorithm chunk.


Also, what would be the scope of the added semantic macros in LaTeX (like \AD, \INT)? Can their scope be limited only to semantic chunks?


William


William Sit
Professor Emeritus
Department of Mathematics
The City College of The City University of New York
New York, NY 10031
homepage: wsit.ccny.cuny.edu

From: Axiom-developer <axiom-developer-bounces+wyscc=[hidden email]> on behalf of Tim Daly <[hidden email]>
Sent: Thursday, August 25, 2016 6:17 AM
To: Dan Zwillinger
Cc: Richard Fateman; James Davenport; [hidden email]; Mike Dewar; axiom-dev; [hidden email]
Subject: Re: [Axiom-developer] Design Thoughts on Semantic Latex (SELATEX)
 
My initial approach was too heavy-handed and Axiom specific.

It seems the semantic markup task can be viewed as an editor
task. Editors don't care about semantics, they just work on text.
Viewed this way the markup's only function is decoration for
post-processing. It is (mostly) system independent.

The edit tasks seem to be {delete, insert, replace} and some
post-markup hints {function, type}

The delete markup tells weaver to remove the latex completely.
This is useful for things like {dx}

The insert markup adds missing semantic text, such as {*} which
is needed to indicate multiplication.

The replace markup gives alternate text for weaver, for things where
the function name might differ, e.g. \int -> integrate

The function markup names a function that weaver should call
which allows special handling in post-processing. It can be any
s-expression (the weaver implementation is currently lisp-based).

The type markup passes type information to weaver so Axiom
knows the target type, useful for things like matrix.

The macros are

\newcommand{\AD}[1]{#1}% delete
\newcommand{\AI}[1]{}% insert
\newcommand{\AR}[2]{#1}% replace

\newcommand{\AF}[2]{#1}% function
\newcommand{\AT}[2]{#1}% type

Note that \AI outputs nothing, so 3\AI{*}x == 3x to latex.

\AD tells weaver to delete the text, e.g. {\AD ~dx} == ~dx to latex

\AR tells weaver to replace the text e.g. {\AR \pi}{\%pi}

\AF tells weaver to call a function, e.g. one that knows how to
rewrite the input in a special way, or does tracing, etc.

\AT adds target type information for Axiom
e.g. \AT{3x+6}{POLY(INT)} == 3x+6 to latex but passes it as
a Polynomial(Integer) to Axiom

For example,

$\int{\frac{dx}{ax*b}}$

becomes

$\AT{\AR{\int}{integrate}{\frac{\AD{dx}}{a\AI{*)x+b}}}{EXPR}{INT}}$

telling weaver that the target type (AT) is EXPR(INT),
the \int is really integrate
the dx is to be ignored and
the ax+b should read a*x+b

There is an obvious tradeoff of markup vs weaver.
For example. \int might be known to weaver.
Or expressions might call an equation rewriter to add {*}

The markup could vary from almost nothing to massive detail
depending on the downstream cleverness.

This initial markup set seems sufficient to handle every task
that requires semantics markup so far. The overhead seems
small and the gain seems large.

Now the only problem is post-processing the latex. Sigh.

There is no such thing as a simple job.

Tim


On Tue, Aug 23, 2016 at 7:27 PM, Tim Daly <[hidden email]> wrote:
For those of you at home wishing to play along, there is a
selatex.test1 file at
http://axiom-developer.org/axiom-website/selatex.test1
containing 620 integrals.

Each line is a call of the form

  weaver(latex-string,axiom-string)

The goal is to transform the latex into Axiom.

Implicit is the idea that weaver will use the selatex tokens
to disambiguate the input. The current file has no selatex
tokens. They will be added as needed. The idea is to keep
the problem simple by adding print-invisible sematics to the
latex-string. In the ideal case the weaver program is trivial,
as is the markup. Any tradeoff should prioritize simplicity.
Another priority is to align the semantic markup with
Axiom domains in order to ground the semantics with code.

Once all of these calls translate correctly the Axiom output
routines need to output the latex-string with the added
semantic markup so the mapping is bi-directional.

The current file only looks at integration as I already have
the latex->axiom text available. Future test files will look
at other areas of interest. The long term goal is to parse
NIST/CRC/etc formulas.

Tim


On Sun, Aug 21, 2016 at 11:02 PM, Tim Daly <[hidden email]> wrote:
Dan,

While paging through the CRC 31st Standard Mathematical
Tables I landed on page 219, section 3.4.1.2 Graph Invariants.

It would be a vast improvement if there were algorithms
associated with these invariants. Clearly they exist somewhere.

To "cross the gap" between tables and computational mathematics
it would be valuable to include implementations of these invariants.

It is hard to walk away from that page. An Axiom implementation
would be fun to write, especially given the next section that lists
different kinds of graphs which, presumably, would all have the
invariants. Even better, the graph algorithms are likely good
candidates for proof technology (ACL2 if done in Lisp, COQ if
done in Spad). Lisp has the advantage of an ANSI standard.

It seems worthwhile to take sections like this, expand them
across computational and proof tools, and publish them in a
form that is generally useful. It is "nice to know" that a graph
has a radius but it would be even better if I could "just point and
click" to import the algorithm.

Axiom has been pushing literate programming for years. The
tools exist to "make it so", as the saying goes.

Tim




On Sun, Aug 21, 2016 at 10:40 PM, Tim Daly <[hidden email]> wrote:
Like any research problem it is a struggle to get a useful grip on this.
Looking at G&R (I just ordered the latest, mine is 4th edition), the
task quickly gets out of hand.

The CATS tests in the past were created by reading the printed latex
in various volumes and hand-translating them to Axiom input.

It is not difficult to re-create the latex input for these examples.
Doing so and combining the results gives a set of examples with
matching input latex and output Axiom. The homework problem is
to write the necessary markup and weaver.

It is immediately obvious that this is more challenging than it seems.
For example, when writing y'(x)=0, Axiom needs y:=operator 'y
so it knows about the symbol as an operator. This falls under
"Consideration 12: System Specific Commands"... which implies
that the latex environment and quoting macros have to be
implemented. Sigh.

There is no such thing as a simple job.

Anyway, at least there is a way to make a proof of concept
prototype that reproduces existing results.

Tim


On Sun, Aug 21, 2016 at 4:17 PM, Tim Daly <[hidden email]> wrote:
Dan,

Welcome.

Re: Howard Cohl. Yes, I'd like an introduction. It seems important to
make DLMF, CRC, and other sources contain enough semantics that
they can be read by a computer algebra system. There are an
enormous number of issues, such as what to do with functions
unknown to the CAS, which need to be thought through.

I believe that NIST/CRC/G&R collections with semantic markup will
have a great normalizing effect on CAS development since it will raise
cross-platform questions like "What percent of G&R do you handle?".
Albert Rich (RUBI)[0] has been doing this for integration using patterns.
This can only benefit computational mathematics in the long term.

I've also campaigned for associating algorithms with published tables.
It is important in the long term to have reference versions. The ACM
used to do this years ago. I'd like to see a Gruntz algorithm for limits
where it can be applied, for instance. It would also provide a focus on
"missing algorithms" or edge cases. Davenport/Trager/Bronstein
algorithms promise a decision procedure but there are no existing
complete implementations. The tables could highlight missing cases,
giving focus to efforts to complete the procedure.

It will also put back-pressure on the tables to define different versions
of the same formulas based on domains (C, R, etc).


"The GR work was more than I had anticipated"... wins the award for
understatement of the decade.

The goal of this effort is to make it possible to read those
formulas directly into a CAS. Axiom is my primary target but
it should be done in a somewhat system agnostic form.

I've spent well over a year creating the computer algebra test suite.
It would be so much easier and more useful if the original sources
could be read directly.

I read your paper. There is an interesting mix of syntax and semantics.

I guess the difference in this effort is that the semantic markup is
intended to be transparent and grounded. The transparent aspect
keeps the tables unchanged in print form. The grounded aspect keeps
the semantics based on some algorithmic base. This implies, of
course, that there IS an algorithmic base which does not yet exist
for some of the work.

As you can see from the CATS work I've been trying to validate
Axiom's results with respect to published texts. This has found both
Axiom bugs and misprints in published texts.

The Kamke[1] suite was the first effort for differential equations.

The Spiegel[2] chapter 14 on indefinite integrals for integration.

The von Seggern[3] book on curves and surfaces for graphics.

The Legendre and Grazini[4] on Pasta by Design for 3D graphics.

The RUBI work on integration.

and, currently I'm re-creating the numerics that were lost when NAG
released the open source version, leaving me swimming through
Luke's[5] Algorithms book.

which, to quote a famous phrase "was more than I had anticipated".

Your Handbook of Integration[6] has a section on various known
"Caveats, How an integration result may be incorrect". This raises the
wonderful topic of branch cuts yet again. I did some testing and it
seems that Axiom and Mathematica share one set while Maple and
Maxima share another.

All of which leads to a need to create better reference materials that
are generally available (unlike the ACM algorithms for non-paying
customers) and directly useful for computational mathematics.

The current plan is to take some tables, find or re-create the latex,
invent a semantic markup, and then write the "weaver". At this point
the research is still at the "proof of concept" stage. (tex formula
sources are most welcome).

Ultimately I'd really like to see a book of formulas and algorithms
that I can just drag-and-drop into Axiom and be able to use them
without lifetimes of work.

Actually, that 's only the penultimate goal. I have augmented
Axiom to include proofs (ACL2,COQ) so I'd also like to see proofs,
(this IS mathematics, after all) but maybe we'll leave that for
next month :-)

Tim

[0] Rich, Albert "Rule-based Mathematics"
http://www.apmaths.uwo.ca/~arich/

[1] Kamke. E. "Differentialgleichungen Losungsmethoden und Losungen"
Chelsea Publishing Company, New York, 1959

[2] Spiegel, Murray R. "Mathematical Handbook", Schaum's Outline
Series; McGraw-Hill Book Company 1968

[3] von Seggern, David "CRC Standard Curves and Surfaces",
CRC Press, 1993 ISBN 0-8493-0196-3

[4] Legendre, George L. and Grazini, Stefano "Pasta by Design",
Thames and Hudson, 2001

[5] Luke, Yudell "Algorithms for the Computation of Mathematical
Functions", Academic Press, 1977 ISBN 0-12-459940-6

[6] Zwillinger, Daniel "Handbook of Integration" Jones and Bartlett,
London, 1992
 


On Sun, Aug 21, 2016 at 10:16 AM, Dan Zwillinger <[hidden email]> wrote:

All

I began reading this topic's emails when they first appeared, and then fell behind.
Sorry for my late comments.

I admire your efforts.
They may be somewhat related to what I have done in the past.
My experience is as follows:

(1) CRC SMTF (Standard Mathematical Tables and Formula)

I put the ~700 integrals in CRC's SMTF into a format from which
(A) they could be typeset in LaTeX
(B) they could be converted into Mathematica (which either did a symbolic or numeric computation) - and this was done

I let Richard Fateman use them for his experiments.


(2) Elsevier's GR (Gradshteyn and Ryzhik's "Table of Integrals, Series, and Products")

I put the ~12,000 (if my memory is correct) integrals into a format from which
(A) they could be beautifully typeset in LaTeX
(B) they could be converted into Mathematica - and this was NOT done
Enclosed is a PDF file describing the work and the resulting format.


A different format was used for SMTF and GR.
While the SMTF work was not too arduous, the GR work was more than I had anticipated.
The input (the previous version of GR) had little syntactic structure and it took much effort to get it into shape.
I used (many different) regular expressions (in perl) to translate the bulk of the book, and then lots of hand tuning.

While I think you are well beyond my thinking on these topics, please let me know if I can help.
I am friends with Howard Cohl at NIST, who may be the current lead for DLMF ("Digital Library of Mathematical Functions" at dlmf.nist.gov).
Let me know if you need an introduction.


Dan Zwillinger
[hidden email]
<a href="tel:617-388-2382" value="&#43;16173882382" target="_blank">617-388-2382
On 8/20/2016 11:30 PM, Tim Daly wrote:
The game is to define latex markup that is transparent to the syntax
but adds semantics for post processing.

Some simple tests show that this works. Suppose selatex.sty contains:

\newcommand{\INT}[1]{#1}
\newcommand{\VARIABLE}[1]{#1}
\newcommand{\POLY}[1]{#1}
\newcommand{\INTEG}[2]{\int{#1}}

This defines 4 new latex markups. The number in the square brackets
defines the number of expected arguments. The brace argument
delimites the characters that will occur during expansion with the #1
replaced by the first argument.

(As an aside, INT, VARIABLE, and POLY just happen to be valid
Axiom domain abbreviations, hence the name choice. This choice
of names gives grounding to the semantics.)

Notice that \INTEG takes two arguments but will display only one.
This allows the variable of integration to be passed in the semantics
without showing up in the output. This allows the semantics to carry
additional, non-display information needed by the CAS.

Some examples follow.

An integer 3 can be wrapped as \INT{3} but will still display as 3.

A variable x can be wrapped as \VARIABLE{x}, displayed as x.

$\POLY{\INT{3}\VARIABLE{x}}$ will display as 3*x

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x} will be the same result
as $\int{3x~dx}$, that is, an
  (integralsign) 3x dx
but notice that the variable of integration is in the semantic markup.

These trivial macros can be made less verbose and certainly
more clever but that's not the point being made here.

A 'weaver' program would see the integration expression as

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x}$

with all of the semantic tags. The weaver's job is to rewrite this
expression into an inputform for the CAS. In Axiom that would be

integrate(3*x,x)

The semantics markup makes the display pretty and the semantic
parsing possible. Depending on the system, more or less parsing
markup can exist. Axiom, for example, would not need the \INT or
\VARIABLE to get a correct parse so the expression could be
$\INTEG{3x~dx}{x}$

This validates the fundamental idea.

The next step is to write a simple weaver program. The clever path
would be to embed a declarative form of the parser syntax (BNF?)
as comments in selatex.sty. That way the latex semantics and the
weaver syntax are kept in sync.

Weaver would read the BNF comments from selatex.sty and
the formula with semantic markup as input and parse the semantic
markup into inputforms. (Wish I thought of this homework problem
when I taught the compiler course :-) ).

Note that, depending on the BNF, weaver could be used to
generate output for Maxima's tree-based representation.

An alternative next step is to look at a CRC book, re-create the
syntactic latex and then create the selatex.sty entries necessary
to generate weaver input.

Infinitesimal progress, but progress non-the-less.

Tim


On Fri, Aug 19, 2016 at 12:45 AM, Tim Daly <[hidden email]> wrote:

One of the Axiom project goals is to develop a "Computer Algebra Test
Suite" (CATS). Albert Rich has done this with RUBI and integration. That
work is already partially in the test suite and work has been done on the
pattern matching. Large datasets (like Kamke) are always welcome. Some,
such as Schaums were hand-developed. This is tedious.

As Axiom develops more explanations and documentation it would be
useful to execute the formulas directly so there is a local incentive to be
clear about semantics.

In the long term the hope is that we can just grab formulas directly from
their sources (ala literate programming). Your work makes it plain that raw
latex does not carry sufficient semantics. There is a global question of
how to make this work. Unfortunately a general cross-platform solution
is difficult (cite Dewar/Davenport/et al. for OpenMath).

Since Axiom is literate and extracting formulas is trivial it seems that
literate markup is a natural goal. Since Axiom uses abstract algebra
as a scaffold the type tower already has a lot of axiomatic semantics.
The natural join of literate latex and abstract algebra is clearly
semantic markup, aka selatex.

===========
Consideration 10: semantic->inputform translation (weaver? :-) )

>x and not x   has no particular meaning,  but if x is explicitly true or false,
>Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
>defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
>semantics, which you presumably know about and can modify.

I am NOT defining another CAS. The goal is a "well-designed hack" using
universally understood latex, a latex package, and a translation program.

The selatex idea is only partially Axiom specific. \INT, for instance, seems
pretty generic. However, if the idea is to read formulas and disambiguate
a=b (boolean) vs a=b (equation) then the markup needs to be grounded
to have meaning. Axiom's domains (BOOLEAN) and (EQ) as the ground

\BOOLEAN(a=b)
\EQ(a=b)

are unambiguous relative to each other in Axiom. I don't know enough
about Maxima to understand how this might translate.

The extracted formulas with the decorated semantics still needs a
semantics->inputform (weaver) pre-processor which could be Maxima
specific. This would lead to debate about what "equality" means, of course.

Axiom has tried to create a first-order "rosetta stone" to translate between
systems (rosetta.pdf [1]) but it is too shallow to consider providing
cross-platform semantics.

=============
Consideration 11: \scope in selatex

>As far as recording stuff in DLMF -- there are presumably scope issues
>("in this chapter n,m are natural numbers....")  and maybe even a need
>to make value assignments. 
>I think you need to model these in SEALATEX too.

(See Consideration 6)

Clearly there are scoping issues. My current thinking is to create a
\scope markup that would manage the environment(s). This is not
a new issue (see "Lisp in Small Pieces" [0])

There seem to be three concerns.

First is the scope name, with something like 'global' as a keyword.
Second is the "closure chain" of other scopes.
Third is the symbol being scoped.

\scope{name}{chain}{symbol}

The weaver program would walk this chain to create the proper
file syntax for system input.

============
Consideration 12: System specific commands \axiom

Along with the formulas it is clear that some system specific
input may be required, such as loading files, clearing workspaces,
etc. Some of these may be done in the weaver program, such as
between formulas. Others may need to be added to the semantics
block. So a markup that provides verbatim quoting per system
might be defined, e.g.

\axiom{)clear all}  %clear the workspace

which would simply quote an input line.

==============

Note that so far all that is being suggested is transparent formula
markups which do not impact the presentation, some special tags
(\scope, \axiom,...) and a weaver program, along with the ability to
read the latex and extract named formulas (aka a literate program,
which Axiom already can do).

It ought to be possible (by design) to create a semantic version of
CRC that any system could import, assuming a "sufficiently clever
weaver".

On a more ambitious note, I am trying to find a way to keep the selatex
markup "hidden" in a pdf and use it as the clipboard paste when the
formula is selected. Anyone with a clue, please help.

===============

[0] Queinnec, Christopher, "Lisp in Small Pieces" ISBN 978-0521545662
(2003)

[1] Wester, Michael J. and Daly, TImothy "Rosetta"



On Thu, Aug 18, 2016 at 5:30 PM, Richard Fateman <[hidden email]> wrote:
thanks for all the references :)

I'm not sure if I'm going to repeat comments I  made already somewhere.
First,  has Dan Zwillinger weighed in?  I think that it would be useful
to see what he has done.

Next, there are ambiguities among CAS and even within a single CAS.

For example, in Macsyma/ Maxima  there is generally no semantics
associated with "=" or ">".   But in some contexts, there is some meaning.

x>2*y

is a tree expression.  It is not associated with x or with y.

assume(x>2*y)   does mean something ... it puts info in a database.
Somehow encoding the method to extract this information into SEALATEX
(SeLaTeX?) in a CAS-independent way -- that's quite a task.   In
particular, it would seem to require an understanding of what assume()
does in Maxima, and what is() does also.

x and not x   has no particular meaning,  but if x is explicitly true or false,
Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
semantics, which you presumably know about and can modify.

As far as recording stuff in DLMF -- there are presumably scope issues
("in this chapter n,m are natural numbers....")  and maybe even a need
to make value assignments. 
I think you need to model these in SEALATEX too.

Just musing about where you are heading.

RJF




On 8/18/2016 11:45 AM, Tim Daly wrote:
Fateman [0] raised a set of issues with the OpenMath
approach. We are not trying to be cross-platform in this
effort. Axiom does provide an algebraic scaffold so it is
possible that the selatex markup might be useful elsewhere
but that is not a design criterion.

Fateman[1] also raises some difficult cross-platform issues
that are not part of this design.

Fateman[2] shows that parsing tex with only syntactic markup
succeeded on only 43% of 10740 inputs. It ought to be posible
to increase this percentage given proper semantic markup.
(Perhaps there should be a competition similar to the deep
learning groups? PhDs have been awarded on incremental
improvements of the percentage)

This is a design-by-crawl approach to the semantic markup
idea. The hope is to get something running this week that
'works' but giving due consideration to global and long-term
issues. A first glance at CRC/NIST raises more questions
than answers as is usual with any research.

It IS a design goal to support a Computer Algebra Test Suite
tedious to hand construct test suites. It will be even more
tedious to construct them "second-level" by doing semantic
markup and then trying to use them as input, but the hope is
that eventually the CRC/NIST/G&R, etc will eventually be
published with semantics so computational mathematics can
stop working from syntax.

===========
Consideration 4: I/O transparency

Assume for the moment that we take a latex file containing
only formulas. We would like to be able to read this file so
it has computational mathematics (CM) semantics.

It is clear that there needs to be semantic tags that carry the
information but these tags have to be carefully designed NOT
to change the syntactic display. They may, as noted before,
require multiple semantic versions for a single syntax.

It is also clear that we would like to be able to output formulas
with CM semantics where currently we only output syntax.

===========
Consideration 5: I/O isomorphism

An important property of selatex is an isomorphism with
input/output. Axiom allows output forms to be defined for a
variety of targets so this does not seem to be a problem. For
input, however, this means that the reader has to know how
to expand \INT{3} into the correct domain. This could be done
with a stand-alone pre-processor from selatex->inputform.

It should be possible to read-then-write an selatex formula,
or write-then-read an selatex formula with identical semantics.

That might not mean that the I/O is identical though due to
things like variable ordering, etc.

===========
Consideration 6: Latex semantic macros

Semantic markup would be greatly simplified if selatex provided
a mechanism similar to Axiom's ability to define types "on the fly"
using either assignment

   TYP:=FRAC(POLY(INT))

or macro form

   TYP ==> FRAC(POLY(INT))

Latex is capable of doing this and selatex should probably include
a set of pre-defined common markups, such as

  \FRINT ==> \FRAC\INT

===========
Consideration 7: selatex \begin{semantic} environment?

Currently Axiom provides a 'chunk' environment which surrounds
source code. The chunks are named so they can be extracted
individually or in groups

   \begin{chunk}{a name for the chunk}
      anything
   \end{chunk}

We could provide a similar environment for semantics such as

  \begin{semantics}{a name for the block}
  \end{semantics}

which would provide a way to encapsulate markup and also allow
a particular block to be extracted in literate programming style.

===========
Consideration 8: Latex-time processing

Axiom currently creates specific files using \write to create
intermediate files (e.g. for tables). This technique can be used
to enhance latex-time debugging (where did it fail?).

It can be used to create Axiom files which pre-construct domains
needed when the input file with semantic markup is read.

This would help a stand-alone selatex->inputform preprocessor.

===========
Consideration 9: Design sketches

It is all well-and-good to hand-wave at this idea but a large
amount of this machinery already exists.

It would seem useful to develop an incremental test suite that
starts with "primitive" domains (e.g. INT), creating selatex I/O.

Once these are in place we could work on "type tower" markup
such as \FRAC\INT or \POLY\COMPLEX\FLOAT.

Following that might be pre-existing latex functions like \int, \sum,
\cos, etc.

To validate these ideas Axiom will include an selatex.sty file and
some unit tests files on primitive domain markup. That should be
enough to start the bikeshed discussions.

Ideas? Considerations? Suggestions?

Tim

[0] Fateman, Richard J.
"A Critique of OpenMath and Thoughts on
Encoding Mathematics, January, 2001"
https://people.eecs.berkeley.edu/~fateman/papers/openmathcrit.pdf

[1] Fateman, Richard J.
"Verbs, Nouns, and Computer Algebra, or What's Grammar Got to
[2] Fateman, Richard J.










_______________________________________________
Axiom-developer mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/axiom-developer
Reply | Threaded
Open this post in threaded view
|

Re: Design Thoughts on Semantic Latex (SELATEX)

William Sit-3
In reply to this post by Tim Daly

​Hi Tim:


A question: how would you handle overloading of operators like * ("multiplication") in a semantic mark-up?  Need the markup be as detailed as the compiler requires or just sloppy enough that the interpreter can figure out the correct semantic?


William


William Sit
Professor Emeritus
Department of Mathematics
The City College of The City University of New York
New York, NY 10031
homepage: wsit.ccny.cuny.edu

From: Axiom-developer <axiom-developer-bounces+wyscc=[hidden email]> on behalf of Tim Daly <[hidden email]>
Sent: Thursday, August 25, 2016 6:17 AM
To: Dan Zwillinger
Cc: Richard Fateman; James Davenport; [hidden email]; Mike Dewar; axiom-dev; [hidden email]
Subject: Re: [Axiom-developer] Design Thoughts on Semantic Latex (SELATEX)
 
My initial approach was too heavy-handed and Axiom specific.

It seems the semantic markup task can be viewed as an editor
task. Editors don't care about semantics, they just work on text.
Viewed this way the markup's only function is decoration for
post-processing. It is (mostly) system independent.

The edit tasks seem to be {delete, insert, replace} and some
post-markup hints {function, type}

The delete markup tells weaver to remove the latex completely.
This is useful for things like {dx}

The insert markup adds missing semantic text, such as {*} which
is needed to indicate multiplication.

The replace markup gives alternate text for weaver, for things where
the function name might differ, e.g. \int -> integrate

The function markup names a function that weaver should call
which allows special handling in post-processing. It can be any
s-expression (the weaver implementation is currently lisp-based).

The type markup passes type information to weaver so Axiom
knows the target type, useful for things like matrix.

The macros are

\newcommand{\AD}[1]{#1}% delete
\newcommand{\AI}[1]{}% insert
\newcommand{\AR}[2]{#1}% replace

\newcommand{\AF}[2]{#1}% function
\newcommand{\AT}[2]{#1}% type

Note that \AI outputs nothing, so 3\AI{*}x == 3x to latex.

\AD tells weaver to delete the text, e.g. {\AD ~dx} == ~dx to latex

\AR tells weaver to replace the text e.g. {\AR \pi}{\%pi}

\AF tells weaver to call a function, e.g. one that knows how to
rewrite the input in a special way, or does tracing, etc.

\AT adds target type information for Axiom
e.g. \AT{3x+6}{POLY(INT)} == 3x+6 to latex but passes it as
a Polynomial(Integer) to Axiom

For example,

$\int{\frac{dx}{ax*b}}$

becomes

$\AT{\AR{\int}{integrate}{\frac{\AD{dx}}{a\AI{*)x+b}}}{EXPR}{INT}}$

telling weaver that the target type (AT) is EXPR(INT),
the \int is really integrate
the dx is to be ignored and
the ax+b should read a*x+b

There is an obvious tradeoff of markup vs weaver.
For example. \int might be known to weaver.
Or expressions might call an equation rewriter to add {*}

The markup could vary from almost nothing to massive detail
depending on the downstream cleverness.

This initial markup set seems sufficient to handle every task
that requires semantics markup so far. The overhead seems
small and the gain seems large.

Now the only problem is post-processing the latex. Sigh.

There is no such thing as a simple job.

Tim


On Tue, Aug 23, 2016 at 7:27 PM, Tim Daly <[hidden email]> wrote:
For those of you at home wishing to play along, there is a
selatex.test1 file at
http://axiom-developer.org/axiom-website/selatex.test1
containing 620 integrals.

Each line is a call of the form

  weaver(latex-string,axiom-string)

The goal is to transform the latex into Axiom.

Implicit is the idea that weaver will use the selatex tokens
to disambiguate the input. The current file has no selatex
tokens. They will be added as needed. The idea is to keep
the problem simple by adding print-invisible sematics to the
latex-string. In the ideal case the weaver program is trivial,
as is the markup. Any tradeoff should prioritize simplicity.
Another priority is to align the semantic markup with
Axiom domains in order to ground the semantics with code.

Once all of these calls translate correctly the Axiom output
routines need to output the latex-string with the added
semantic markup so the mapping is bi-directional.

The current file only looks at integration as I already have
the latex->axiom text available. Future test files will look
at other areas of interest. The long term goal is to parse
NIST/CRC/etc formulas.

Tim


On Sun, Aug 21, 2016 at 11:02 PM, Tim Daly <[hidden email]> wrote:
Dan,

While paging through the CRC 31st Standard Mathematical
Tables I landed on page 219, section 3.4.1.2 Graph Invariants.

It would be a vast improvement if there were algorithms
associated with these invariants. Clearly they exist somewhere.

To "cross the gap" between tables and computational mathematics
it would be valuable to include implementations of these invariants.

It is hard to walk away from that page. An Axiom implementation
would be fun to write, especially given the next section that lists
different kinds of graphs which, presumably, would all have the
invariants. Even better, the graph algorithms are likely good
candidates for proof technology (ACL2 if done in Lisp, COQ if
done in Spad). Lisp has the advantage of an ANSI standard.

It seems worthwhile to take sections like this, expand them
across computational and proof tools, and publish them in a
form that is generally useful. It is "nice to know" that a graph
has a radius but it would be even better if I could "just point and
click" to import the algorithm.

Axiom has been pushing literate programming for years. The
tools exist to "make it so", as the saying goes.

Tim




On Sun, Aug 21, 2016 at 10:40 PM, Tim Daly <[hidden email]> wrote:
Like any research problem it is a struggle to get a useful grip on this.
Looking at G&R (I just ordered the latest, mine is 4th edition), the
task quickly gets out of hand.

The CATS tests in the past were created by reading the printed latex
in various volumes and hand-translating them to Axiom input.

It is not difficult to re-create the latex input for these examples.
Doing so and combining the results gives a set of examples with
matching input latex and output Axiom. The homework problem is
to write the necessary markup and weaver.

It is immediately obvious that this is more challenging than it seems.
For example, when writing y'(x)=0, Axiom needs y:=operator 'y
so it knows about the symbol as an operator. This falls under
"Consideration 12: System Specific Commands"... which implies
that the latex environment and quoting macros have to be
implemented. Sigh.

There is no such thing as a simple job.

Anyway, at least there is a way to make a proof of concept
prototype that reproduces existing results.

Tim


On Sun, Aug 21, 2016 at 4:17 PM, Tim Daly <[hidden email]> wrote:
Dan,

Welcome.

Re: Howard Cohl. Yes, I'd like an introduction. It seems important to
make DLMF, CRC, and other sources contain enough semantics that
they can be read by a computer algebra system. There are an
enormous number of issues, such as what to do with functions
unknown to the CAS, which need to be thought through.

I believe that NIST/CRC/G&R collections with semantic markup will
have a great normalizing effect on CAS development since it will raise
cross-platform questions like "What percent of G&R do you handle?".
Albert Rich (RUBI)[0] has been doing this for integration using patterns.
This can only benefit computational mathematics in the long term.

I've also campaigned for associating algorithms with published tables.
It is important in the long term to have reference versions. The ACM
used to do this years ago. I'd like to see a Gruntz algorithm for limits
where it can be applied, for instance. It would also provide a focus on
"missing algorithms" or edge cases. Davenport/Trager/Bronstein
algorithms promise a decision procedure but there are no existing
complete implementations. The tables could highlight missing cases,
giving focus to efforts to complete the procedure.

It will also put back-pressure on the tables to define different versions
of the same formulas based on domains (C, R, etc).


"The GR work was more than I had anticipated"... wins the award for
understatement of the decade.

The goal of this effort is to make it possible to read those
formulas directly into a CAS. Axiom is my primary target but
it should be done in a somewhat system agnostic form.

I've spent well over a year creating the computer algebra test suite.
It would be so much easier and more useful if the original sources
could be read directly.

I read your paper. There is an interesting mix of syntax and semantics.

I guess the difference in this effort is that the semantic markup is
intended to be transparent and grounded. The transparent aspect
keeps the tables unchanged in print form. The grounded aspect keeps
the semantics based on some algorithmic base. This implies, of
course, that there IS an algorithmic base which does not yet exist
for some of the work.

As you can see from the CATS work I've been trying to validate
Axiom's results with respect to published texts. This has found both
Axiom bugs and misprints in published texts.

The Kamke[1] suite was the first effort for differential equations.

The Spiegel[2] chapter 14 on indefinite integrals for integration.

The von Seggern[3] book on curves and surfaces for graphics.

The Legendre and Grazini[4] on Pasta by Design for 3D graphics.

The RUBI work on integration.

and, currently I'm re-creating the numerics that were lost when NAG
released the open source version, leaving me swimming through
Luke's[5] Algorithms book.

which, to quote a famous phrase "was more than I had anticipated".

Your Handbook of Integration[6] has a section on various known
"Caveats, How an integration result may be incorrect". This raises the
wonderful topic of branch cuts yet again. I did some testing and it
seems that Axiom and Mathematica share one set while Maple and
Maxima share another.

All of which leads to a need to create better reference materials that
are generally available (unlike the ACM algorithms for non-paying
customers) and directly useful for computational mathematics.

The current plan is to take some tables, find or re-create the latex,
invent a semantic markup, and then write the "weaver". At this point
the research is still at the "proof of concept" stage. (tex formula
sources are most welcome).

Ultimately I'd really like to see a book of formulas and algorithms
that I can just drag-and-drop into Axiom and be able to use them
without lifetimes of work.

Actually, that 's only the penultimate goal. I have augmented
Axiom to include proofs (ACL2,COQ) so I'd also like to see proofs,
(this IS mathematics, after all) but maybe we'll leave that for
next month :-)

Tim

[0] Rich, Albert "Rule-based Mathematics"
http://www.apmaths.uwo.ca/~arich/

[1] Kamke. E. "Differentialgleichungen Losungsmethoden und Losungen"
Chelsea Publishing Company, New York, 1959

[2] Spiegel, Murray R. "Mathematical Handbook", Schaum's Outline
Series; McGraw-Hill Book Company 1968

[3] von Seggern, David "CRC Standard Curves and Surfaces",
CRC Press, 1993 ISBN 0-8493-0196-3

[4] Legendre, George L. and Grazini, Stefano "Pasta by Design",
Thames and Hudson, 2001

[5] Luke, Yudell "Algorithms for the Computation of Mathematical
Functions", Academic Press, 1977 ISBN 0-12-459940-6

[6] Zwillinger, Daniel "Handbook of Integration" Jones and Bartlett,
London, 1992
 


On Sun, Aug 21, 2016 at 10:16 AM, Dan Zwillinger <[hidden email]> wrote:

All

I began reading this topic's emails when they first appeared, and then fell behind.
Sorry for my late comments.

I admire your efforts.
They may be somewhat related to what I have done in the past.
My experience is as follows:

(1) CRC SMTF (Standard Mathematical Tables and Formula)

I put the ~700 integrals in CRC's SMTF into a format from which
(A) they could be typeset in LaTeX
(B) they could be converted into Mathematica (which either did a symbolic or numeric computation) - and this was done

I let Richard Fateman use them for his experiments.


(2) Elsevier's GR (Gradshteyn and Ryzhik's "Table of Integrals, Series, and Products")

I put the ~12,000 (if my memory is correct) integrals into a format from which
(A) they could be beautifully typeset in LaTeX
(B) they could be converted into Mathematica - and this was NOT done
Enclosed is a PDF file describing the work and the resulting format.


A different format was used for SMTF and GR.
While the SMTF work was not too arduous, the GR work was more than I had anticipated.
The input (the previous version of GR) had little syntactic structure and it took much effort to get it into shape.
I used (many different) regular expressions (in perl) to translate the bulk of the book, and then lots of hand tuning.

While I think you are well beyond my thinking on these topics, please let me know if I can help.
I am friends with Howard Cohl at NIST, who may be the current lead for DLMF ("Digital Library of Mathematical Functions" at dlmf.nist.gov).
Let me know if you need an introduction.


Dan Zwillinger
[hidden email]
<a href="tel:617-388-2382" value="&#43;16173882382" target="_blank">617-388-2382
On 8/20/2016 11:30 PM, Tim Daly wrote:
The game is to define latex markup that is transparent to the syntax
but adds semantics for post processing.

Some simple tests show that this works. Suppose selatex.sty contains:

\newcommand{\INT}[1]{#1}
\newcommand{\VARIABLE}[1]{#1}
\newcommand{\POLY}[1]{#1}
\newcommand{\INTEG}[2]{\int{#1}}

This defines 4 new latex markups. The number in the square brackets
defines the number of expected arguments. The brace argument
delimites the characters that will occur during expansion with the #1
replaced by the first argument.

(As an aside, INT, VARIABLE, and POLY just happen to be valid
Axiom domain abbreviations, hence the name choice. This choice
of names gives grounding to the semantics.)

Notice that \INTEG takes two arguments but will display only one.
This allows the variable of integration to be passed in the semantics
without showing up in the output. This allows the semantics to carry
additional, non-display information needed by the CAS.

Some examples follow.

An integer 3 can be wrapped as \INT{3} but will still display as 3.

A variable x can be wrapped as \VARIABLE{x}, displayed as x.

$\POLY{\INT{3}\VARIABLE{x}}$ will display as 3*x

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x} will be the same result
as $\int{3x~dx}$, that is, an
  (integralsign) 3x dx
but notice that the variable of integration is in the semantic markup.

These trivial macros can be made less verbose and certainly
more clever but that's not the point being made here.

A 'weaver' program would see the integration expression as

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x}$

with all of the semantic tags. The weaver's job is to rewrite this
expression into an inputform for the CAS. In Axiom that would be

integrate(3*x,x)

The semantics markup makes the display pretty and the semantic
parsing possible. Depending on the system, more or less parsing
markup can exist. Axiom, for example, would not need the \INT or
\VARIABLE to get a correct parse so the expression could be
$\INTEG{3x~dx}{x}$

This validates the fundamental idea.

The next step is to write a simple weaver program. The clever path
would be to embed a declarative form of the parser syntax (BNF?)
as comments in selatex.sty. That way the latex semantics and the
weaver syntax are kept in sync.

Weaver would read the BNF comments from selatex.sty and
the formula with semantic markup as input and parse the semantic
markup into inputforms. (Wish I thought of this homework problem
when I taught the compiler course :-) ).

Note that, depending on the BNF, weaver could be used to
generate output for Maxima's tree-based representation.

An alternative next step is to look at a CRC book, re-create the
syntactic latex and then create the selatex.sty entries necessary
to generate weaver input.

Infinitesimal progress, but progress non-the-less.

Tim


On Fri, Aug 19, 2016 at 12:45 AM, Tim Daly <[hidden email]> wrote:

One of the Axiom project goals is to develop a "Computer Algebra Test
Suite" (CATS). Albert Rich has done this with RUBI and integration. That
work is already partially in the test suite and work has been done on the
pattern matching. Large datasets (like Kamke) are always welcome. Some,
such as Schaums were hand-developed. This is tedious.

As Axiom develops more explanations and documentation it would be
useful to execute the formulas directly so there is a local incentive to be
clear about semantics.

In the long term the hope is that we can just grab formulas directly from
their sources (ala literate programming). Your work makes it plain that raw
latex does not carry sufficient semantics. There is a global question of
how to make this work. Unfortunately a general cross-platform solution
is difficult (cite Dewar/Davenport/et al. for OpenMath).

Since Axiom is literate and extracting formulas is trivial it seems that
literate markup is a natural goal. Since Axiom uses abstract algebra
as a scaffold the type tower already has a lot of axiomatic semantics.
The natural join of literate latex and abstract algebra is clearly
semantic markup, aka selatex.

===========
Consideration 10: semantic->inputform translation (weaver? :-) )

>x and not x   has no particular meaning,  but if x is explicitly true or false,
>Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
>defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
>semantics, which you presumably know about and can modify.

I am NOT defining another CAS. The goal is a "well-designed hack" using
universally understood latex, a latex package, and a translation program.

The selatex idea is only partially Axiom specific. \INT, for instance, seems
pretty generic. However, if the idea is to read formulas and disambiguate
a=b (boolean) vs a=b (equation) then the markup needs to be grounded
to have meaning. Axiom's domains (BOOLEAN) and (EQ) as the ground

\BOOLEAN(a=b)
\EQ(a=b)

are unambiguous relative to each other in Axiom. I don't know enough
about Maxima to understand how this might translate.

The extracted formulas with the decorated semantics still needs a
semantics->inputform (weaver) pre-processor which could be Maxima
specific. This would lead to debate about what "equality" means, of course.

Axiom has tried to create a first-order "rosetta stone" to translate between
systems (rosetta.pdf [1]) but it is too shallow to consider providing
cross-platform semantics.

=============
Consideration 11: \scope in selatex

>As far as recording stuff in DLMF -- there are presumably scope issues
>("in this chapter n,m are natural numbers....")  and maybe even a need
>to make value assignments. 
>I think you need to model these in SEALATEX too.

(See Consideration 6)

Clearly there are scoping issues. My current thinking is to create a
\scope markup that would manage the environment(s). This is not
a new issue (see "Lisp in Small Pieces" [0])

There seem to be three concerns.

First is the scope name, with something like 'global' as a keyword.
Second is the "closure chain" of other scopes.
Third is the symbol being scoped.

\scope{name}{chain}{symbol}

The weaver program would walk this chain to create the proper
file syntax for system input.

============
Consideration 12: System specific commands \axiom

Along with the formulas it is clear that some system specific
input may be required, such as loading files, clearing workspaces,
etc. Some of these may be done in the weaver program, such as
between formulas. Others may need to be added to the semantics
block. So a markup that provides verbatim quoting per system
might be defined, e.g.

\axiom{)clear all}  %clear the workspace

which would simply quote an input line.

==============

Note that so far all that is being suggested is transparent formula
markups which do not impact the presentation, some special tags
(\scope, \axiom,...) and a weaver program, along with the ability to
read the latex and extract named formulas (aka a literate program,
which Axiom already can do).

It ought to be possible (by design) to create a semantic version of
CRC that any system could import, assuming a "sufficiently clever
weaver".

On a more ambitious note, I am trying to find a way to keep the selatex
markup "hidden" in a pdf and use it as the clipboard paste when the
formula is selected. Anyone with a clue, please help.

===============

[0] Queinnec, Christopher, "Lisp in Small Pieces" ISBN 978-0521545662
(2003)

[1] Wester, Michael J. and Daly, TImothy "Rosetta"



On Thu, Aug 18, 2016 at 5:30 PM, Richard Fateman <[hidden email]> wrote:
thanks for all the references :)

I'm not sure if I'm going to repeat comments I  made already somewhere.
First,  has Dan Zwillinger weighed in?  I think that it would be useful
to see what he has done.

Next, there are ambiguities among CAS and even within a single CAS.

For example, in Macsyma/ Maxima  there is generally no semantics
associated with "=" or ">".   But in some contexts, there is some meaning.

x>2*y

is a tree expression.  It is not associated with x or with y.

assume(x>2*y)   does mean something ... it puts info in a database.
Somehow encoding the method to extract this information into SEALATEX
(SeLaTeX?) in a CAS-independent way -- that's quite a task.   In
particular, it would seem to require an understanding of what assume()
does in Maxima, and what is() does also.

x and not x   has no particular meaning,  but if x is explicitly true or false,
Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
semantics, which you presumably know about and can modify.

As far as recording stuff in DLMF -- there are presumably scope issues
("in this chapter n,m are natural numbers....")  and maybe even a need
to make value assignments. 
I think you need to model these in SEALATEX too.

Just musing about where you are heading.

RJF




On 8/18/2016 11:45 AM, Tim Daly wrote:
Fateman [0] raised a set of issues with the OpenMath
approach. We are not trying to be cross-platform in this
effort. Axiom does provide an algebraic scaffold so it is
possible that the selatex markup might be useful elsewhere
but that is not a design criterion.

Fateman[1] also raises some difficult cross-platform issues
that are not part of this design.

Fateman[2] shows that parsing tex with only syntactic markup
succeeded on only 43% of 10740 inputs. It ought to be posible
to increase this percentage given proper semantic markup.
(Perhaps there should be a competition similar to the deep
learning groups? PhDs have been awarded on incremental
improvements of the percentage)

This is a design-by-crawl approach to the semantic markup
idea. The hope is to get something running this week that
'works' but giving due consideration to global and long-term
issues. A first glance at CRC/NIST raises more questions
than answers as is usual with any research.

It IS a design goal to support a Computer Algebra Test Suite
tedious to hand construct test suites. It will be even more
tedious to construct them "second-level" by doing semantic
markup and then trying to use them as input, but the hope is
that eventually the CRC/NIST/G&R, etc will eventually be
published with semantics so computational mathematics can
stop working from syntax.

===========
Consideration 4: I/O transparency

Assume for the moment that we take a latex file containing
only formulas. We would like to be able to read this file so
it has computational mathematics (CM) semantics.

It is clear that there needs to be semantic tags that carry the
information but these tags have to be carefully designed NOT
to change the syntactic display. They may, as noted before,
require multiple semantic versions for a single syntax.

It is also clear that we would like to be able to output formulas
with CM semantics where currently we only output syntax.

===========
Consideration 5: I/O isomorphism

An important property of selatex is an isomorphism with
input/output. Axiom allows output forms to be defined for a
variety of targets so this does not seem to be a problem. For
input, however, this means that the reader has to know how
to expand \INT{3} into the correct domain. This could be done
with a stand-alone pre-processor from selatex->inputform.

It should be possible to read-then-write an selatex formula,
or write-then-read an selatex formula with identical semantics.

That might not mean that the I/O is identical though due to
things like variable ordering, etc.

===========
Consideration 6: Latex semantic macros

Semantic markup would be greatly simplified if selatex provided
a mechanism similar to Axiom's ability to define types "on the fly"
using either assignment

   TYP:=FRAC(POLY(INT))

or macro form

   TYP ==> FRAC(POLY(INT))

Latex is capable of doing this and selatex should probably include
a set of pre-defined common markups, such as

  \FRINT ==> \FRAC\INT

===========
Consideration 7: selatex \begin{semantic} environment?

Currently Axiom provides a 'chunk' environment which surrounds
source code. The chunks are named so they can be extracted
individually or in groups

   \begin{chunk}{a name for the chunk}
      anything
   \end{chunk}

We could provide a similar environment for semantics such as

  \begin{semantics}{a name for the block}
  \end{semantics}

which would provide a way to encapsulate markup and also allow
a particular block to be extracted in literate programming style.

===========
Consideration 8: Latex-time processing

Axiom currently creates specific files using \write to create
intermediate files (e.g. for tables). This technique can be used
to enhance latex-time debugging (where did it fail?).

It can be used to create Axiom files which pre-construct domains
needed when the input file with semantic markup is read.

This would help a stand-alone selatex->inputform preprocessor.

===========
Consideration 9: Design sketches

It is all well-and-good to hand-wave at this idea but a large
amount of this machinery already exists.

It would seem useful to develop an incremental test suite that
starts with "primitive" domains (e.g. INT), creating selatex I/O.

Once these are in place we could work on "type tower" markup
such as \FRAC\INT or \POLY\COMPLEX\FLOAT.

Following that might be pre-existing latex functions like \int, \sum,
\cos, etc.

To validate these ideas Axiom will include an selatex.sty file and
some unit tests files on primitive domain markup. That should be
enough to start the bikeshed discussions.

Ideas? Considerations? Suggestions?

Tim

[0] Fateman, Richard J.
"A Critique of OpenMath and Thoughts on
Encoding Mathematics, January, 2001"
https://people.eecs.berkeley.edu/~fateman/papers/openmathcrit.pdf

[1] Fateman, Richard J.
"Verbs, Nouns, and Computer Algebra, or What's Grammar Got to
[2] Fateman, Richard J.










_______________________________________________
Axiom-developer mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/axiom-developer
Reply | Threaded
Open this post in threaded view
|

Re: Design Thoughts on Semantic Latex (SELATEX)

Tim Daly
In reply to this post by William Sit-3
William,

It is unlikely that authors will provide a special chunk for Axiom in papers.

Such an ability already exists but I don't expect anyone will adopt it.
The \usepackage{axiom} and \begin{chunk} / \end{chunk} pair work and is
all that is needed.

The primary target of this effort (although not restricted to them) are the
various collections (NIST/CRC/G&R/Jeffrey/etc) of formulas. I use these
references to create the computer algebra test suite but it takes months
to do this by hand. I also use them to build regression tests for Axiom.

There have been various attempts to extract semantics from latex. Some
are quite interesting (see http://mathpix.com). Unfortunately, there isn't
enough information in the latex. For instance, are your formulas given
over the real or complex domain?

In the longer term I am campaigning to bend these tomes toward a
more computational mathematics basis. Instead of listing the names of
20 invariant graph algorithms we really need reference versions of the
algorithms. And we need them in machine-readable form. And we need
them now so a whole generation of computational mathematicians do
not write yet-another-CAS from scratch.

Tim









On Thu, Aug 25, 2016 at 9:13 AM, William Sit <[hidden email]> wrote:

Hi Tim:


Would it be simpler to only add semantic markups to algorithmic descriptions in papers? Authors can be asked to provide a separate chunk with [Axiom] semantic markups (in essence, a skeleton implementation or pseudo-code of the algorithm involved---skeleton because the data structures of mathematical objects are usually ignored in a math paper).  This would avoid having to mess with the latex source (already hard to read sometimes) or to "weave" to remove the semantic markups to recapture the latex: all that is needed would be to ignore the semantic chunk). Put another way, the semantic chunk is a direct (by hand or automatic) translation of the latex version of an algorithm chunk.


Also, what would be the scope of the added semantic macros in LaTeX (like \AD, \INT)? Can their scope be limited only to semantic chunks?


William


William Sit
Professor Emeritus
Department of Mathematics
The City College of The City University of New York
New York, NY 10031

From: Axiom-developer <axiom-developer-bounces+wyscc=[hidden email]> on behalf of Tim Daly <[hidden email]>
Sent: Thursday, August 25, 2016 6:17 AM
To: Dan Zwillinger
Cc: Richard Fateman; James Davenport; [hidden email]; Mike Dewar; axiom-dev; [hidden email]
Subject: Re: [Axiom-developer] Design Thoughts on Semantic Latex (SELATEX)
 
My initial approach was too heavy-handed and Axiom specific.

It seems the semantic markup task can be viewed as an editor
task. Editors don't care about semantics, they just work on text.
Viewed this way the markup's only function is decoration for
post-processing. It is (mostly) system independent.

The edit tasks seem to be {delete, insert, replace} and some
post-markup hints {function, type}

The delete markup tells weaver to remove the latex completely.
This is useful for things like {dx}

The insert markup adds missing semantic text, such as {*} which
is needed to indicate multiplication.

The replace markup gives alternate text for weaver, for things where
the function name might differ, e.g. \int -> integrate

The function markup names a function that weaver should call
which allows special handling in post-processing. It can be any
s-expression (the weaver implementation is currently lisp-based).

The type markup passes type information to weaver so Axiom
knows the target type, useful for things like matrix.

The macros are

\newcommand{\AD}[1]{#1}% delete
\newcommand{\AI}[1]{}% insert
\newcommand{\AR}[2]{#1}% replace

\newcommand{\AF}[2]{#1}% function
\newcommand{\AT}[2]{#1}% type

Note that \AI outputs nothing, so 3\AI{*}x == 3x to latex.

\AD tells weaver to delete the text, e.g. {\AD ~dx} == ~dx to latex

\AR tells weaver to replace the text e.g. {\AR \pi}{\%pi}

\AF tells weaver to call a function, e.g. one that knows how to
rewrite the input in a special way, or does tracing, etc.

\AT adds target type information for Axiom
e.g. \AT{3x+6}{POLY(INT)} == 3x+6 to latex but passes it as
a Polynomial(Integer) to Axiom

For example,

$\int{\frac{dx}{ax*b}}$

becomes

$\AT{\AR{\int}{integrate}{\frac{\AD{dx}}{a\AI{*)x+b}}}{EXPR}{INT}}$

telling weaver that the target type (AT) is EXPR(INT),
the \int is really integrate
the dx is to be ignored and
the ax+b should read a*x+b

There is an obvious tradeoff of markup vs weaver.
For example. \int might be known to weaver.
Or expressions might call an equation rewriter to add {*}

The markup could vary from almost nothing to massive detail
depending on the downstream cleverness.

This initial markup set seems sufficient to handle every task
that requires semantics markup so far. The overhead seems
small and the gain seems large.

Now the only problem is post-processing the latex. Sigh.

There is no such thing as a simple job.

Tim


On Tue, Aug 23, 2016 at 7:27 PM, Tim Daly <[hidden email]> wrote:
For those of you at home wishing to play along, there is a
selatex.test1 file at
http://axiom-developer.org/axiom-website/selatex.test1
containing 620 integrals.

Each line is a call of the form

  weaver(latex-string,axiom-string)

The goal is to transform the latex into Axiom.

Implicit is the idea that weaver will use the selatex tokens
to disambiguate the input. The current file has no selatex
tokens. They will be added as needed. The idea is to keep
the problem simple by adding print-invisible sematics to the
latex-string. In the ideal case the weaver program is trivial,
as is the markup. Any tradeoff should prioritize simplicity.
Another priority is to align the semantic markup with
Axiom domains in order to ground the semantics with code.

Once all of these calls translate correctly the Axiom output
routines need to output the latex-string with the added
semantic markup so the mapping is bi-directional.

The current file only looks at integration as I already have
the latex->axiom text available. Future test files will look
at other areas of interest. The long term goal is to parse
NIST/CRC/etc formulas.

Tim


On Sun, Aug 21, 2016 at 11:02 PM, Tim Daly <[hidden email]> wrote:
Dan,

While paging through the CRC 31st Standard Mathematical
Tables I landed on page 219, section 3.4.1.2 Graph Invariants.

It would be a vast improvement if there were algorithms
associated with these invariants. Clearly they exist somewhere.

To "cross the gap" between tables and computational mathematics
it would be valuable to include implementations of these invariants.

It is hard to walk away from that page. An Axiom implementation
would be fun to write, especially given the next section that lists
different kinds of graphs which, presumably, would all have the
invariants. Even better, the graph algorithms are likely good
candidates for proof technology (ACL2 if done in Lisp, COQ if
done in Spad). Lisp has the advantage of an ANSI standard.

It seems worthwhile to take sections like this, expand them
across computational and proof tools, and publish them in a
form that is generally useful. It is "nice to know" that a graph
has a radius but it would be even better if I could "just point and
click" to import the algorithm.

Axiom has been pushing literate programming for years. The
tools exist to "make it so", as the saying goes.

Tim




On Sun, Aug 21, 2016 at 10:40 PM, Tim Daly <[hidden email]> wrote:
Like any research problem it is a struggle to get a useful grip on this.
Looking at G&R (I just ordered the latest, mine is 4th edition), the
task quickly gets out of hand.

The CATS tests in the past were created by reading the printed latex
in various volumes and hand-translating them to Axiom input.

It is not difficult to re-create the latex input for these examples.
Doing so and combining the results gives a set of examples with
matching input latex and output Axiom. The homework problem is
to write the necessary markup and weaver.

It is immediately obvious that this is more challenging than it seems.
For example, when writing y'(x)=0, Axiom needs y:=operator 'y
so it knows about the symbol as an operator. This falls under
"Consideration 12: System Specific Commands"... which implies
that the latex environment and quoting macros have to be
implemented. Sigh.

There is no such thing as a simple job.

Anyway, at least there is a way to make a proof of concept
prototype that reproduces existing results.

Tim


On Sun, Aug 21, 2016 at 4:17 PM, Tim Daly <[hidden email]> wrote:
Dan,

Welcome.

Re: Howard Cohl. Yes, I'd like an introduction. It seems important to
make DLMF, CRC, and other sources contain enough semantics that
they can be read by a computer algebra system. There are an
enormous number of issues, such as what to do with functions
unknown to the CAS, which need to be thought through.

I believe that NIST/CRC/G&R collections with semantic markup will
have a great normalizing effect on CAS development since it will raise
cross-platform questions like "What percent of G&R do you handle?".
Albert Rich (RUBI)[0] has been doing this for integration using patterns.
This can only benefit computational mathematics in the long term.

I've also campaigned for associating algorithms with published tables.
It is important in the long term to have reference versions. The ACM
used to do this years ago. I'd like to see a Gruntz algorithm for limits
where it can be applied, for instance. It would also provide a focus on
"missing algorithms" or edge cases. Davenport/Trager/Bronstein
algorithms promise a decision procedure but there are no existing
complete implementations. The tables could highlight missing cases,
giving focus to efforts to complete the procedure.

It will also put back-pressure on the tables to define different versions
of the same formulas based on domains (C, R, etc).


"The GR work was more than I had anticipated"... wins the award for
understatement of the decade.

The goal of this effort is to make it possible to read those
formulas directly into a CAS. Axiom is my primary target but
it should be done in a somewhat system agnostic form.

I've spent well over a year creating the computer algebra test suite.
It would be so much easier and more useful if the original sources
could be read directly.

I read your paper. There is an interesting mix of syntax and semantics.

I guess the difference in this effort is that the semantic markup is
intended to be transparent and grounded. The transparent aspect
keeps the tables unchanged in print form. The grounded aspect keeps
the semantics based on some algorithmic base. This implies, of
course, that there IS an algorithmic base which does not yet exist
for some of the work.

As you can see from the CATS work I've been trying to validate
Axiom's results with respect to published texts. This has found both
Axiom bugs and misprints in published texts.

The Kamke[1] suite was the first effort for differential equations.

The Spiegel[2] chapter 14 on indefinite integrals for integration.

The von Seggern[3] book on curves and surfaces for graphics.

The Legendre and Grazini[4] on Pasta by Design for 3D graphics.

The RUBI work on integration.

and, currently I'm re-creating the numerics that were lost when NAG
released the open source version, leaving me swimming through
Luke's[5] Algorithms book.

which, to quote a famous phrase "was more than I had anticipated".

Your Handbook of Integration[6] has a section on various known
"Caveats, How an integration result may be incorrect". This raises the
wonderful topic of branch cuts yet again. I did some testing and it
seems that Axiom and Mathematica share one set while Maple and
Maxima share another.

All of which leads to a need to create better reference materials that
are generally available (unlike the ACM algorithms for non-paying
customers) and directly useful for computational mathematics.

The current plan is to take some tables, find or re-create the latex,
invent a semantic markup, and then write the "weaver". At this point
the research is still at the "proof of concept" stage. (tex formula
sources are most welcome).

Ultimately I'd really like to see a book of formulas and algorithms
that I can just drag-and-drop into Axiom and be able to use them
without lifetimes of work.

Actually, that 's only the penultimate goal. I have augmented
Axiom to include proofs (ACL2,COQ) so I'd also like to see proofs,
(this IS mathematics, after all) but maybe we'll leave that for
next month :-)

Tim

[0] Rich, Albert "Rule-based Mathematics"
http://www.apmaths.uwo.ca/~arich/

[1] Kamke. E. "Differentialgleichungen Losungsmethoden und Losungen"
Chelsea Publishing Company, New York, 1959

[2] Spiegel, Murray R. "Mathematical Handbook", Schaum's Outline
Series; McGraw-Hill Book Company 1968

[3] von Seggern, David "CRC Standard Curves and Surfaces",
CRC Press, 1993 ISBN 0-8493-0196-3

[4] Legendre, George L. and Grazini, Stefano "Pasta by Design",
Thames and Hudson, 2001

[5] Luke, Yudell "Algorithms for the Computation of Mathematical
Functions", Academic Press, 1977 ISBN 0-12-459940-6

[6] Zwillinger, Daniel "Handbook of Integration" Jones and Bartlett,
London, 1992
 


On Sun, Aug 21, 2016 at 10:16 AM, Dan Zwillinger <[hidden email]> wrote:

All

I began reading this topic's emails when they first appeared, and then fell behind.
Sorry for my late comments.

I admire your efforts.
They may be somewhat related to what I have done in the past.
My experience is as follows:

(1) CRC SMTF (Standard Mathematical Tables and Formula)

I put the ~700 integrals in CRC's SMTF into a format from which
(A) they could be typeset in LaTeX
(B) they could be converted into Mathematica (which either did a symbolic or numeric computation) - and this was done

I let Richard Fateman use them for his experiments.


(2) Elsevier's GR (Gradshteyn and Ryzhik's "Table of Integrals, Series, and Products")

I put the ~12,000 (if my memory is correct) integrals into a format from which
(A) they could be beautifully typeset in LaTeX
(B) they could be converted into Mathematica - and this was NOT done
Enclosed is a PDF file describing the work and the resulting format.


A different format was used for SMTF and GR.
While the SMTF work was not too arduous, the GR work was more than I had anticipated.
The input (the previous version of GR) had little syntactic structure and it took much effort to get it into shape.
I used (many different) regular expressions (in perl) to translate the bulk of the book, and then lots of hand tuning.

While I think you are well beyond my thinking on these topics, please let me know if I can help.
I am friends with Howard Cohl at NIST, who may be the current lead for DLMF ("Digital Library of Mathematical Functions" at dlmf.nist.gov).
Let me know if you need an introduction.


Dan Zwillinger
[hidden email]
<a href="tel:617-388-2382" value="+16173882382" target="_blank">617-388-2382
On 8/20/2016 11:30 PM, Tim Daly wrote:
The game is to define latex markup that is transparent to the syntax
but adds semantics for post processing.

Some simple tests show that this works. Suppose selatex.sty contains:

\newcommand{\INT}[1]{#1}
\newcommand{\VARIABLE}[1]{#1}
\newcommand{\POLY}[1]{#1}
\newcommand{\INTEG}[2]{\int{#1}}

This defines 4 new latex markups. The number in the square brackets
defines the number of expected arguments. The brace argument
delimites the characters that will occur during expansion with the #1
replaced by the first argument.

(As an aside, INT, VARIABLE, and POLY just happen to be valid
Axiom domain abbreviations, hence the name choice. This choice
of names gives grounding to the semantics.)

Notice that \INTEG takes two arguments but will display only one.
This allows the variable of integration to be passed in the semantics
without showing up in the output. This allows the semantics to carry
additional, non-display information needed by the CAS.

Some examples follow.

An integer 3 can be wrapped as \INT{3} but will still display as 3.

A variable x can be wrapped as \VARIABLE{x}, displayed as x.

$\POLY{\INT{3}\VARIABLE{x}}$ will display as 3*x

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x} will be the same result
as $\int{3x~dx}$, that is, an
  (integralsign) 3x dx
but notice that the variable of integration is in the semantic markup.

These trivial macros can be made less verbose and certainly
more clever but that's not the point being made here.

A 'weaver' program would see the integration expression as

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x}$

with all of the semantic tags. The weaver's job is to rewrite this
expression into an inputform for the CAS. In Axiom that would be

integrate(3*x,x)

The semantics markup makes the display pretty and the semantic
parsing possible. Depending on the system, more or less parsing
markup can exist. Axiom, for example, would not need the \INT or
\VARIABLE to get a correct parse so the expression could be
$\INTEG{3x~dx}{x}$

This validates the fundamental idea.

The next step is to write a simple weaver program. The clever path
would be to embed a declarative form of the parser syntax (BNF?)
as comments in selatex.sty. That way the latex semantics and the
weaver syntax are kept in sync.

Weaver would read the BNF comments from selatex.sty and
the formula with semantic markup as input and parse the semantic
markup into inputforms. (Wish I thought of this homework problem
when I taught the compiler course :-) ).

Note that, depending on the BNF, weaver could be used to
generate output for Maxima's tree-based representation.

An alternative next step is to look at a CRC book, re-create the
syntactic latex and then create the selatex.sty entries necessary
to generate weaver input.

Infinitesimal progress, but progress non-the-less.

Tim


On Fri, Aug 19, 2016 at 12:45 AM, Tim Daly <[hidden email]> wrote:

One of the Axiom project goals is to develop a "Computer Algebra Test
Suite" (CATS). Albert Rich has done this with RUBI and integration. That
work is already partially in the test suite and work has been done on the
pattern matching. Large datasets (like Kamke) are always welcome. Some,
such as Schaums were hand-developed. This is tedious.

As Axiom develops more explanations and documentation it would be
useful to execute the formulas directly so there is a local incentive to be
clear about semantics.

In the long term the hope is that we can just grab formulas directly from
their sources (ala literate programming). Your work makes it plain that raw
latex does not carry sufficient semantics. There is a global question of
how to make this work. Unfortunately a general cross-platform solution
is difficult (cite Dewar/Davenport/et al. for OpenMath).

Since Axiom is literate and extracting formulas is trivial it seems that
literate markup is a natural goal. Since Axiom uses abstract algebra
as a scaffold the type tower already has a lot of axiomatic semantics.
The natural join of literate latex and abstract algebra is clearly
semantic markup, aka selatex.

===========
Consideration 10: semantic->inputform translation (weaver? :-) )

>x and not x   has no particular meaning,  but if x is explicitly true or false,
>Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
>defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
>semantics, which you presumably know about and can modify.

I am NOT defining another CAS. The goal is a "well-designed hack" using
universally understood latex, a latex package, and a translation program.

The selatex idea is only partially Axiom specific. \INT, for instance, seems
pretty generic. However, if the idea is to read formulas and disambiguate
a=b (boolean) vs a=b (equation) then the markup needs to be grounded
to have meaning. Axiom's domains (BOOLEAN) and (EQ) as the ground

\BOOLEAN(a=b)
\EQ(a=b)

are unambiguous relative to each other in Axiom. I don't know enough
about Maxima to understand how this might translate.

The extracted formulas with the decorated semantics still needs a
semantics->inputform (weaver) pre-processor which could be Maxima
specific. This would lead to debate about what "equality" means, of course.

Axiom has tried to create a first-order "rosetta stone" to translate between
systems (rosetta.pdf [1]) but it is too shallow to consider providing
cross-platform semantics.

=============
Consideration 11: \scope in selatex

>As far as recording stuff in DLMF -- there are presumably scope issues
>("in this chapter n,m are natural numbers....")  and maybe even a need
>to make value assignments. 
>I think you need to model these in SEALATEX too.

(See Consideration 6)

Clearly there are scoping issues. My current thinking is to create a
\scope markup that would manage the environment(s). This is not
a new issue (see "Lisp in Small Pieces" [0])

There seem to be three concerns.

First is the scope name, with something like 'global' as a keyword.
Second is the "closure chain" of other scopes.
Third is the symbol being scoped.

\scope{name}{chain}{symbol}

The weaver program would walk this chain to create the proper
file syntax for system input.

============
Consideration 12: System specific commands \axiom

Along with the formulas it is clear that some system specific
input may be required, such as loading files, clearing workspaces,
etc. Some of these may be done in the weaver program, such as
between formulas. Others may need to be added to the semantics
block. So a markup that provides verbatim quoting per system
might be defined, e.g.

\axiom{)clear all}  %clear the workspace

which would simply quote an input line.

==============

Note that so far all that is being suggested is transparent formula
markups which do not impact the presentation, some special tags
(\scope, \axiom,...) and a weaver program, along with the ability to
read the latex and extract named formulas (aka a literate program,
which Axiom already can do).

It ought to be possible (by design) to create a semantic version of
CRC that any system could import, assuming a "sufficiently clever
weaver".

On a more ambitious note, I am trying to find a way to keep the selatex
markup "hidden" in a pdf and use it as the clipboard paste when the
formula is selected. Anyone with a clue, please help.

===============

[0] Queinnec, Christopher, "Lisp in Small Pieces" ISBN 978-0521545662
(2003)

[1] Wester, Michael J. and Daly, TImothy "Rosetta"



On Thu, Aug 18, 2016 at 5:30 PM, Richard Fateman <[hidden email]> wrote:
thanks for all the references :)

I'm not sure if I'm going to repeat comments I  made already somewhere.
First,  has Dan Zwillinger weighed in?  I think that it would be useful
to see what he has done.

Next, there are ambiguities among CAS and even within a single CAS.

For example, in Macsyma/ Maxima  there is generally no semantics
associated with "=" or ">".   But in some contexts, there is some meaning.

x>2*y

is a tree expression.  It is not associated with x or with y.

assume(x>2*y)   does mean something ... it puts info in a database.
Somehow encoding the method to extract this information into SEALATEX
(SeLaTeX?) in a CAS-independent way -- that's quite a task.   In
particular, it would seem to require an understanding of what assume()
does in Maxima, and what is() does also.

x and not x   has no particular meaning,  but if x is explicitly true or false,
Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
semantics, which you presumably know about and can modify.

As far as recording stuff in DLMF -- there are presumably scope issues
("in this chapter n,m are natural numbers....")  and maybe even a need
to make value assignments. 
I think you need to model these in SEALATEX too.

Just musing about where you are heading.

RJF




On 8/18/2016 11:45 AM, Tim Daly wrote:
Fateman [0] raised a set of issues with the OpenMath
approach. We are not trying to be cross-platform in this
effort. Axiom does provide an algebraic scaffold so it is
possible that the selatex markup might be useful elsewhere
but that is not a design criterion.

Fateman[1] also raises some difficult cross-platform issues
that are not part of this design.

Fateman[2] shows that parsing tex with only syntactic markup
succeeded on only 43% of 10740 inputs. It ought to be posible
to increase this percentage given proper semantic markup.
(Perhaps there should be a competition similar to the deep
learning groups? PhDs have been awarded on incremental
improvements of the percentage)

This is a design-by-crawl approach to the semantic markup
idea. The hope is to get something running this week that
'works' but giving due consideration to global and long-term
issues. A first glance at CRC/NIST raises more questions
than answers as is usual with any research.

It IS a design goal to support a Computer Algebra Test Suite
tedious to hand construct test suites. It will be even more
tedious to construct them "second-level" by doing semantic
markup and then trying to use them as input, but the hope is
that eventually the CRC/NIST/G&R, etc will eventually be
published with semantics so computational mathematics can
stop working from syntax.

===========
Consideration 4: I/O transparency

Assume for the moment that we take a latex file containing
only formulas. We would like to be able to read this file so
it has computational mathematics (CM) semantics.

It is clear that there needs to be semantic tags that carry the
information but these tags have to be carefully designed NOT
to change the syntactic display. They may, as noted before,
require multiple semantic versions for a single syntax.

It is also clear that we would like to be able to output formulas
with CM semantics where currently we only output syntax.

===========
Consideration 5: I/O isomorphism

An important property of selatex is an isomorphism with
input/output. Axiom allows output forms to be defined for a
variety of targets so this does not seem to be a problem. For
input, however, this means that the reader has to know how
to expand \INT{3} into the correct domain. This could be done
with a stand-alone pre-processor from selatex->inputform.

It should be possible to read-then-write an selatex formula,
or write-then-read an selatex formula with identical semantics.

That might not mean that the I/O is identical though due to
things like variable ordering, etc.

===========
Consideration 6: Latex semantic macros

Semantic markup would be greatly simplified if selatex provided
a mechanism similar to Axiom's ability to define types "on the fly"
using either assignment

   TYP:=FRAC(POLY(INT))

or macro form

   TYP ==> FRAC(POLY(INT))

Latex is capable of doing this and selatex should probably include
a set of pre-defined common markups, such as

  \FRINT ==> \FRAC\INT

===========
Consideration 7: selatex \begin{semantic} environment?

Currently Axiom provides a 'chunk' environment which surrounds
source code. The chunks are named so they can be extracted
individually or in groups

   \begin{chunk}{a name for the chunk}
      anything
   \end{chunk}

We could provide a similar environment for semantics such as

  \begin{semantics}{a name for the block}
  \end{semantics}

which would provide a way to encapsulate markup and also allow
a particular block to be extracted in literate programming style.

===========
Consideration 8: Latex-time processing

Axiom currently creates specific files using \write to create
intermediate files (e.g. for tables). This technique can be used
to enhance latex-time debugging (where did it fail?).

It can be used to create Axiom files which pre-construct domains
needed when the input file with semantic markup is read.

This would help a stand-alone selatex->inputform preprocessor.

===========
Consideration 9: Design sketches

It is all well-and-good to hand-wave at this idea but a large
amount of this machinery already exists.

It would seem useful to develop an incremental test suite that
starts with "primitive" domains (e.g. INT), creating selatex I/O.

Once these are in place we could work on "type tower" markup
such as \FRAC\INT or \POLY\COMPLEX\FLOAT.

Following that might be pre-existing latex functions like \int, \sum,
\cos, etc.

To validate these ideas Axiom will include an selatex.sty file and
some unit tests files on primitive domain markup. That should be
enough to start the bikeshed discussions.

Ideas? Considerations? Suggestions?

Tim

[0] Fateman, Richard J.
"A Critique of OpenMath and Thoughts on
Encoding Mathematics, January, 2001"
https://people.eecs.berkeley.edu/~fateman/papers/openmathcrit.pdf

[1] Fateman, Richard J.
"Verbs, Nouns, and Computer Algebra, or What's Grammar Got to
[2] Fateman, Richard J.











_______________________________________________
Axiom-developer mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/axiom-developer
Reply | Threaded
Open this post in threaded view
|

Re: Design Thoughts on Semantic Latex (SELATEX)

Tim Daly
In reply to this post by William Sit-3
Suppose the latex has MN.

This might be a symbol, two integers or two matrices.

The semantic markup, in an Axiom context, would provide enough
information to disambiguate the implied multiplication as well as the
overload of the multiplication.

First, the modified markup puts an "insert" call to add the implied
multiplication. \AI takes an expression, in this case "*".

M\AI{*}N still formats as MN but the weaver sees the M\AI{*}N text.

Second, for Axiom. the "type" call can specify the "overload" to use.
\AT takes an expression and a type target.

\AT{M\AI{*}N}{SQMATRIX(2,INT}}

which also formats as MN since the markup does not change it.

The weaver program sees that this is a multiplication from
SquareMatrix(2,INT), that is, two-by-two matrix multiplication with
Integer entries.

By adding the \AI and \AT markup we can know that this is a
matrix multiplication, not a symbol or integer multiply.

If M and N were previously marked as type SQMATRIX(2,INT) it
would not be necessary to add any markup at all since the weaver
already knows that MN must be a matrix multiply.

Tim


On Thu, Aug 25, 2016 at 9:22 AM, William Sit <[hidden email]> wrote:

​Hi Tim:


A question: how would you handle overloading of operators like * ("multiplication") in a semantic mark-up?  Need the markup be as detailed as the compiler requires or just sloppy enough that the interpreter can figure out the correct semantic?


William


William Sit
Professor Emeritus
Department of Mathematics
The City College of The City University of New York
New York, NY 10031

From: Axiom-developer <axiom-developer-bounces+wyscc=[hidden email]> on behalf of Tim Daly <[hidden email]>
Sent: Thursday, August 25, 2016 6:17 AM
To: Dan Zwillinger
Cc: Richard Fateman; James Davenport; [hidden email]; Mike Dewar; axiom-dev; [hidden email]
Subject: Re: [Axiom-developer] Design Thoughts on Semantic Latex (SELATEX)
 
My initial approach was too heavy-handed and Axiom specific.

It seems the semantic markup task can be viewed as an editor
task. Editors don't care about semantics, they just work on text.
Viewed this way the markup's only function is decoration for
post-processing. It is (mostly) system independent.

The edit tasks seem to be {delete, insert, replace} and some
post-markup hints {function, type}

The delete markup tells weaver to remove the latex completely.
This is useful for things like {dx}

The insert markup adds missing semantic text, such as {*} which
is needed to indicate multiplication.

The replace markup gives alternate text for weaver, for things where
the function name might differ, e.g. \int -> integrate

The function markup names a function that weaver should call
which allows special handling in post-processing. It can be any
s-expression (the weaver implementation is currently lisp-based).

The type markup passes type information to weaver so Axiom
knows the target type, useful for things like matrix.

The macros are

\newcommand{\AD}[1]{#1}% delete
\newcommand{\AI}[1]{}% insert
\newcommand{\AR}[2]{#1}% replace

\newcommand{\AF}[2]{#1}% function
\newcommand{\AT}[2]{#1}% type

Note that \AI outputs nothing, so 3\AI{*}x == 3x to latex.

\AD tells weaver to delete the text, e.g. {\AD ~dx} == ~dx to latex

\AR tells weaver to replace the text e.g. {\AR \pi}{\%pi}

\AF tells weaver to call a function, e.g. one that knows how to
rewrite the input in a special way, or does tracing, etc.

\AT adds target type information for Axiom
e.g. \AT{3x+6}{POLY(INT)} == 3x+6 to latex but passes it as
a Polynomial(Integer) to Axiom

For example,

$\int{\frac{dx}{ax*b}}$

becomes

$\AT{\AR{\int}{integrate}{\frac{\AD{dx}}{a\AI{*)x+b}}}{EXPR}{INT}}$

telling weaver that the target type (AT) is EXPR(INT),
the \int is really integrate
the dx is to be ignored and
the ax+b should read a*x+b

There is an obvious tradeoff of markup vs weaver.
For example. \int might be known to weaver.
Or expressions might call an equation rewriter to add {*}

The markup could vary from almost nothing to massive detail
depending on the downstream cleverness.

This initial markup set seems sufficient to handle every task
that requires semantics markup so far. The overhead seems
small and the gain seems large.

Now the only problem is post-processing the latex. Sigh.

There is no such thing as a simple job.

Tim


On Tue, Aug 23, 2016 at 7:27 PM, Tim Daly <[hidden email]> wrote:
For those of you at home wishing to play along, there is a
selatex.test1 file at
http://axiom-developer.org/axiom-website/selatex.test1
containing 620 integrals.

Each line is a call of the form

  weaver(latex-string,axiom-string)

The goal is to transform the latex into Axiom.

Implicit is the idea that weaver will use the selatex tokens
to disambiguate the input. The current file has no selatex
tokens. They will be added as needed. The idea is to keep
the problem simple by adding print-invisible sematics to the
latex-string. In the ideal case the weaver program is trivial,
as is the markup. Any tradeoff should prioritize simplicity.
Another priority is to align the semantic markup with
Axiom domains in order to ground the semantics with code.

Once all of these calls translate correctly the Axiom output
routines need to output the latex-string with the added
semantic markup so the mapping is bi-directional.

The current file only looks at integration as I already have
the latex->axiom text available. Future test files will look
at other areas of interest. The long term goal is to parse
NIST/CRC/etc formulas.

Tim


On Sun, Aug 21, 2016 at 11:02 PM, Tim Daly <[hidden email]> wrote:
Dan,

While paging through the CRC 31st Standard Mathematical
Tables I landed on page 219, section 3.4.1.2 Graph Invariants.

It would be a vast improvement if there were algorithms
associated with these invariants. Clearly they exist somewhere.

To "cross the gap" between tables and computational mathematics
it would be valuable to include implementations of these invariants.

It is hard to walk away from that page. An Axiom implementation
would be fun to write, especially given the next section that lists
different kinds of graphs which, presumably, would all have the
invariants. Even better, the graph algorithms are likely good
candidates for proof technology (ACL2 if done in Lisp, COQ if
done in Spad). Lisp has the advantage of an ANSI standard.

It seems worthwhile to take sections like this, expand them
across computational and proof tools, and publish them in a
form that is generally useful. It is "nice to know" that a graph
has a radius but it would be even better if I could "just point and
click" to import the algorithm.

Axiom has been pushing literate programming for years. The
tools exist to "make it so", as the saying goes.

Tim




On Sun, Aug 21, 2016 at 10:40 PM, Tim Daly <[hidden email]> wrote:
Like any research problem it is a struggle to get a useful grip on this.
Looking at G&R (I just ordered the latest, mine is 4th edition), the
task quickly gets out of hand.

The CATS tests in the past were created by reading the printed latex
in various volumes and hand-translating them to Axiom input.

It is not difficult to re-create the latex input for these examples.
Doing so and combining the results gives a set of examples with
matching input latex and output Axiom. The homework problem is
to write the necessary markup and weaver.

It is immediately obvious that this is more challenging than it seems.
For example, when writing y'(x)=0, Axiom needs y:=operator 'y
so it knows about the symbol as an operator. This falls under
"Consideration 12: System Specific Commands"... which implies
that the latex environment and quoting macros have to be
implemented. Sigh.

There is no such thing as a simple job.

Anyway, at least there is a way to make a proof of concept
prototype that reproduces existing results.

Tim


On Sun, Aug 21, 2016 at 4:17 PM, Tim Daly <[hidden email]> wrote:
Dan,

Welcome.

Re: Howard Cohl. Yes, I'd like an introduction. It seems important to
make DLMF, CRC, and other sources contain enough semantics that
they can be read by a computer algebra system. There are an
enormous number of issues, such as what to do with functions
unknown to the CAS, which need to be thought through.

I believe that NIST/CRC/G&R collections with semantic markup will
have a great normalizing effect on CAS development since it will raise
cross-platform questions like "What percent of G&R do you handle?".
Albert Rich (RUBI)[0] has been doing this for integration using patterns.
This can only benefit computational mathematics in the long term.

I've also campaigned for associating algorithms with published tables.
It is important in the long term to have reference versions. The ACM
used to do this years ago. I'd like to see a Gruntz algorithm for limits
where it can be applied, for instance. It would also provide a focus on
"missing algorithms" or edge cases. Davenport/Trager/Bronstein
algorithms promise a decision procedure but there are no existing
complete implementations. The tables could highlight missing cases,
giving focus to efforts to complete the procedure.

It will also put back-pressure on the tables to define different versions
of the same formulas based on domains (C, R, etc).


"The GR work was more than I had anticipated"... wins the award for
understatement of the decade.

The goal of this effort is to make it possible to read those
formulas directly into a CAS. Axiom is my primary target but
it should be done in a somewhat system agnostic form.

I've spent well over a year creating the computer algebra test suite.
It would be so much easier and more useful if the original sources
could be read directly.

I read your paper. There is an interesting mix of syntax and semantics.

I guess the difference in this effort is that the semantic markup is
intended to be transparent and grounded. The transparent aspect
keeps the tables unchanged in print form. The grounded aspect keeps
the semantics based on some algorithmic base. This implies, of
course, that there IS an algorithmic base which does not yet exist
for some of the work.

As you can see from the CATS work I've been trying to validate
Axiom's results with respect to published texts. This has found both
Axiom bugs and misprints in published texts.

The Kamke[1] suite was the first effort for differential equations.

The Spiegel[2] chapter 14 on indefinite integrals for integration.

The von Seggern[3] book on curves and surfaces for graphics.

The Legendre and Grazini[4] on Pasta by Design for 3D graphics.

The RUBI work on integration.

and, currently I'm re-creating the numerics that were lost when NAG
released the open source version, leaving me swimming through
Luke's[5] Algorithms book.

which, to quote a famous phrase "was more than I had anticipated".

Your Handbook of Integration[6] has a section on various known
"Caveats, How an integration result may be incorrect". This raises the
wonderful topic of branch cuts yet again. I did some testing and it
seems that Axiom and Mathematica share one set while Maple and
Maxima share another.

All of which leads to a need to create better reference materials that
are generally available (unlike the ACM algorithms for non-paying
customers) and directly useful for computational mathematics.

The current plan is to take some tables, find or re-create the latex,
invent a semantic markup, and then write the "weaver". At this point
the research is still at the "proof of concept" stage. (tex formula
sources are most welcome).

Ultimately I'd really like to see a book of formulas and algorithms
that I can just drag-and-drop into Axiom and be able to use them
without lifetimes of work.

Actually, that 's only the penultimate goal. I have augmented
Axiom to include proofs (ACL2,COQ) so I'd also like to see proofs,
(this IS mathematics, after all) but maybe we'll leave that for
next month :-)

Tim

[0] Rich, Albert "Rule-based Mathematics"
http://www.apmaths.uwo.ca/~arich/

[1] Kamke. E. "Differentialgleichungen Losungsmethoden und Losungen"
Chelsea Publishing Company, New York, 1959

[2] Spiegel, Murray R. "Mathematical Handbook", Schaum's Outline
Series; McGraw-Hill Book Company 1968

[3] von Seggern, David "CRC Standard Curves and Surfaces",
CRC Press, 1993 ISBN 0-8493-0196-3

[4] Legendre, George L. and Grazini, Stefano "Pasta by Design",
Thames and Hudson, 2001

[5] Luke, Yudell "Algorithms for the Computation of Mathematical
Functions", Academic Press, 1977 ISBN 0-12-459940-6

[6] Zwillinger, Daniel "Handbook of Integration" Jones and Bartlett,
London, 1992
 


On Sun, Aug 21, 2016 at 10:16 AM, Dan Zwillinger <[hidden email]> wrote:

All

I began reading this topic's emails when they first appeared, and then fell behind.
Sorry for my late comments.

I admire your efforts.
They may be somewhat related to what I have done in the past.
My experience is as follows:

(1) CRC SMTF (Standard Mathematical Tables and Formula)

I put the ~700 integrals in CRC's SMTF into a format from which
(A) they could be typeset in LaTeX
(B) they could be converted into Mathematica (which either did a symbolic or numeric computation) - and this was done

I let Richard Fateman use them for his experiments.


(2) Elsevier's GR (Gradshteyn and Ryzhik's "Table of Integrals, Series, and Products")

I put the ~12,000 (if my memory is correct) integrals into a format from which
(A) they could be beautifully typeset in LaTeX
(B) they could be converted into Mathematica - and this was NOT done
Enclosed is a PDF file describing the work and the resulting format.


A different format was used for SMTF and GR.
While the SMTF work was not too arduous, the GR work was more than I had anticipated.
The input (the previous version of GR) had little syntactic structure and it took much effort to get it into shape.
I used (many different) regular expressions (in perl) to translate the bulk of the book, and then lots of hand tuning.

While I think you are well beyond my thinking on these topics, please let me know if I can help.
I am friends with Howard Cohl at NIST, who may be the current lead for DLMF ("Digital Library of Mathematical Functions" at dlmf.nist.gov).
Let me know if you need an introduction.


Dan Zwillinger
[hidden email]
<a href="tel:617-388-2382" value="+16173882382" target="_blank">617-388-2382
On 8/20/2016 11:30 PM, Tim Daly wrote:
The game is to define latex markup that is transparent to the syntax
but adds semantics for post processing.

Some simple tests show that this works. Suppose selatex.sty contains:

\newcommand{\INT}[1]{#1}
\newcommand{\VARIABLE}[1]{#1}
\newcommand{\POLY}[1]{#1}
\newcommand{\INTEG}[2]{\int{#1}}

This defines 4 new latex markups. The number in the square brackets
defines the number of expected arguments. The brace argument
delimites the characters that will occur during expansion with the #1
replaced by the first argument.

(As an aside, INT, VARIABLE, and POLY just happen to be valid
Axiom domain abbreviations, hence the name choice. This choice
of names gives grounding to the semantics.)

Notice that \INTEG takes two arguments but will display only one.
This allows the variable of integration to be passed in the semantics
without showing up in the output. This allows the semantics to carry
additional, non-display information needed by the CAS.

Some examples follow.

An integer 3 can be wrapped as \INT{3} but will still display as 3.

A variable x can be wrapped as \VARIABLE{x}, displayed as x.

$\POLY{\INT{3}\VARIABLE{x}}$ will display as 3*x

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x} will be the same result
as $\int{3x~dx}$, that is, an
  (integralsign) 3x dx
but notice that the variable of integration is in the semantic markup.

These trivial macros can be made less verbose and certainly
more clever but that's not the point being made here.

A 'weaver' program would see the integration expression as

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x}$

with all of the semantic tags. The weaver's job is to rewrite this
expression into an inputform for the CAS. In Axiom that would be

integrate(3*x,x)

The semantics markup makes the display pretty and the semantic
parsing possible. Depending on the system, more or less parsing
markup can exist. Axiom, for example, would not need the \INT or
\VARIABLE to get a correct parse so the expression could be
$\INTEG{3x~dx}{x}$

This validates the fundamental idea.

The next step is to write a simple weaver program. The clever path
would be to embed a declarative form of the parser syntax (BNF?)
as comments in selatex.sty. That way the latex semantics and the
weaver syntax are kept in sync.

Weaver would read the BNF comments from selatex.sty and
the formula with semantic markup as input and parse the semantic
markup into inputforms. (Wish I thought of this homework problem
when I taught the compiler course :-) ).

Note that, depending on the BNF, weaver could be used to
generate output for Maxima's tree-based representation.

An alternative next step is to look at a CRC book, re-create the
syntactic latex and then create the selatex.sty entries necessary
to generate weaver input.

Infinitesimal progress, but progress non-the-less.

Tim


On Fri, Aug 19, 2016 at 12:45 AM, Tim Daly <[hidden email]> wrote:

One of the Axiom project goals is to develop a "Computer Algebra Test
Suite" (CATS). Albert Rich has done this with RUBI and integration. That
work is already partially in the test suite and work has been done on the
pattern matching. Large datasets (like Kamke) are always welcome. Some,
such as Schaums were hand-developed. This is tedious.

As Axiom develops more explanations and documentation it would be
useful to execute the formulas directly so there is a local incentive to be
clear about semantics.

In the long term the hope is that we can just grab formulas directly from
their sources (ala literate programming). Your work makes it plain that raw
latex does not carry sufficient semantics. There is a global question of
how to make this work. Unfortunately a general cross-platform solution
is difficult (cite Dewar/Davenport/et al. for OpenMath).

Since Axiom is literate and extracting formulas is trivial it seems that
literate markup is a natural goal. Since Axiom uses abstract algebra
as a scaffold the type tower already has a lot of axiomatic semantics.
The natural join of literate latex and abstract algebra is clearly
semantic markup, aka selatex.

===========
Consideration 10: semantic->inputform translation (weaver? :-) )

>x and not x   has no particular meaning,  but if x is explicitly true or false,
>Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
>defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
>semantics, which you presumably know about and can modify.

I am NOT defining another CAS. The goal is a "well-designed hack" using
universally understood latex, a latex package, and a translation program.

The selatex idea is only partially Axiom specific. \INT, for instance, seems
pretty generic. However, if the idea is to read formulas and disambiguate
a=b (boolean) vs a=b (equation) then the markup needs to be grounded
to have meaning. Axiom's domains (BOOLEAN) and (EQ) as the ground

\BOOLEAN(a=b)
\EQ(a=b)

are unambiguous relative to each other in Axiom. I don't know enough
about Maxima to understand how this might translate.

The extracted formulas with the decorated semantics still needs a
semantics->inputform (weaver) pre-processor which could be Maxima
specific. This would lead to debate about what "equality" means, of course.

Axiom has tried to create a first-order "rosetta stone" to translate between
systems (rosetta.pdf [1]) but it is too shallow to consider providing
cross-platform semantics.

=============
Consideration 11: \scope in selatex

>As far as recording stuff in DLMF -- there are presumably scope issues
>("in this chapter n,m are natural numbers....")  and maybe even a need
>to make value assignments. 
>I think you need to model these in SEALATEX too.

(See Consideration 6)

Clearly there are scoping issues. My current thinking is to create a
\scope markup that would manage the environment(s). This is not
a new issue (see "Lisp in Small Pieces" [0])

There seem to be three concerns.

First is the scope name, with something like 'global' as a keyword.
Second is the "closure chain" of other scopes.
Third is the symbol being scoped.

\scope{name}{chain}{symbol}

The weaver program would walk this chain to create the proper
file syntax for system input.

============
Consideration 12: System specific commands \axiom

Along with the formulas it is clear that some system specific
input may be required, such as loading files, clearing workspaces,
etc. Some of these may be done in the weaver program, such as
between formulas. Others may need to be added to the semantics
block. So a markup that provides verbatim quoting per system
might be defined, e.g.

\axiom{)clear all}  %clear the workspace

which would simply quote an input line.

==============

Note that so far all that is being suggested is transparent formula
markups which do not impact the presentation, some special tags
(\scope, \axiom,...) and a weaver program, along with the ability to
read the latex and extract named formulas (aka a literate program,
which Axiom already can do).

It ought to be possible (by design) to create a semantic version of
CRC that any system could import, assuming a "sufficiently clever
weaver".

On a more ambitious note, I am trying to find a way to keep the selatex
markup "hidden" in a pdf and use it as the clipboard paste when the
formula is selected. Anyone with a clue, please help.

===============

[0] Queinnec, Christopher, "Lisp in Small Pieces" ISBN 978-0521545662
(2003)

[1] Wester, Michael J. and Daly, TImothy "Rosetta"



On Thu, Aug 18, 2016 at 5:30 PM, Richard Fateman <[hidden email]> wrote:
thanks for all the references :)

I'm not sure if I'm going to repeat comments I  made already somewhere.
First,  has Dan Zwillinger weighed in?  I think that it would be useful
to see what he has done.

Next, there are ambiguities among CAS and even within a single CAS.

For example, in Macsyma/ Maxima  there is generally no semantics
associated with "=" or ">".   But in some contexts, there is some meaning.

x>2*y

is a tree expression.  It is not associated with x or with y.

assume(x>2*y)   does mean something ... it puts info in a database.
Somehow encoding the method to extract this information into SEALATEX
(SeLaTeX?) in a CAS-independent way -- that's quite a task.   In
particular, it would seem to require an understanding of what assume()
does in Maxima, and what is() does also.

x and not x   has no particular meaning,  but if x is explicitly true or false,
Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
semantics, which you presumably know about and can modify.

As far as recording stuff in DLMF -- there are presumably scope issues
("in this chapter n,m are natural numbers....")  and maybe even a need
to make value assignments. 
I think you need to model these in SEALATEX too.

Just musing about where you are heading.

RJF




On 8/18/2016 11:45 AM, Tim Daly wrote:
Fateman [0] raised a set of issues with the OpenMath
approach. We are not trying to be cross-platform in this
effort. Axiom does provide an algebraic scaffold so it is
possible that the selatex markup might be useful elsewhere
but that is not a design criterion.

Fateman[1] also raises some difficult cross-platform issues
that are not part of this design.

Fateman[2] shows that parsing tex with only syntactic markup
succeeded on only 43% of 10740 inputs. It ought to be posible
to increase this percentage given proper semantic markup.
(Perhaps there should be a competition similar to the deep
learning groups? PhDs have been awarded on incremental
improvements of the percentage)

This is a design-by-crawl approach to the semantic markup
idea. The hope is to get something running this week that
'works' but giving due consideration to global and long-term
issues. A first glance at CRC/NIST raises more questions
than answers as is usual with any research.

It IS a design goal to support a Computer Algebra Test Suite
tedious to hand construct test suites. It will be even more
tedious to construct them "second-level" by doing semantic
markup and then trying to use them as input, but the hope is
that eventually the CRC/NIST/G&R, etc will eventually be
published with semantics so computational mathematics can
stop working from syntax.

===========
Consideration 4: I/O transparency

Assume for the moment that we take a latex file containing
only formulas. We would like to be able to read this file so
it has computational mathematics (CM) semantics.

It is clear that there needs to be semantic tags that carry the
information but these tags have to be carefully designed NOT
to change the syntactic display. They may, as noted before,
require multiple semantic versions for a single syntax.

It is also clear that we would like to be able to output formulas
with CM semantics where currently we only output syntax.

===========
Consideration 5: I/O isomorphism

An important property of selatex is an isomorphism with
input/output. Axiom allows output forms to be defined for a
variety of targets so this does not seem to be a problem. For
input, however, this means that the reader has to know how
to expand \INT{3} into the correct domain. This could be done
with a stand-alone pre-processor from selatex->inputform.

It should be possible to read-then-write an selatex formula,
or write-then-read an selatex formula with identical semantics.

That might not mean that the I/O is identical though due to
things like variable ordering, etc.

===========
Consideration 6: Latex semantic macros

Semantic markup would be greatly simplified if selatex provided
a mechanism similar to Axiom's ability to define types "on the fly"
using either assignment

   TYP:=FRAC(POLY(INT))

or macro form

   TYP ==> FRAC(POLY(INT))

Latex is capable of doing this and selatex should probably include
a set of pre-defined common markups, such as

  \FRINT ==> \FRAC\INT

===========
Consideration 7: selatex \begin{semantic} environment?

Currently Axiom provides a 'chunk' environment which surrounds
source code. The chunks are named so they can be extracted
individually or in groups

   \begin{chunk}{a name for the chunk}
      anything
   \end{chunk}

We could provide a similar environment for semantics such as

  \begin{semantics}{a name for the block}
  \end{semantics}

which would provide a way to encapsulate markup and also allow
a particular block to be extracted in literate programming style.

===========
Consideration 8: Latex-time processing

Axiom currently creates specific files using \write to create
intermediate files (e.g. for tables). This technique can be used
to enhance latex-time debugging (where did it fail?).

It can be used to create Axiom files which pre-construct domains
needed when the input file with semantic markup is read.

This would help a stand-alone selatex->inputform preprocessor.

===========
Consideration 9: Design sketches

It is all well-and-good to hand-wave at this idea but a large
amount of this machinery already exists.

It would seem useful to develop an incremental test suite that
starts with "primitive" domains (e.g. INT), creating selatex I/O.

Once these are in place we could work on "type tower" markup
such as \FRAC\INT or \POLY\COMPLEX\FLOAT.

Following that might be pre-existing latex functions like \int, \sum,
\cos, etc.

To validate these ideas Axiom will include an selatex.sty file and
some unit tests files on primitive domain markup. That should be
enough to start the bikeshed discussions.

Ideas? Considerations? Suggestions?

Tim

[0] Fateman, Richard J.
"A Critique of OpenMath and Thoughts on
Encoding Mathematics, January, 2001"
https://people.eecs.berkeley.edu/~fateman/papers/openmathcrit.pdf

[1] Fateman, Richard J.
"Verbs, Nouns, and Computer Algebra, or What's Grammar Got to
[2] Fateman, Richard J.











_______________________________________________
Axiom-developer mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/axiom-developer
Reply | Threaded
Open this post in threaded view
|

Re: Design Thoughts on Semantic Latex (SELATEX)

William Sit-3
In reply to this post by Tim Daly

Tim:


Thanks for taking the time to address my ignorant questions.

You wrote: "For instance, are your formulas given over the real or complex domain?​"


This question is of course relevant in computation. Even in Axiom, which allows domains (belonging to a specific Axiom category) as parameters in function or package calls, the compiler needs to know the exact domain at compile time (and with some more effort, delay this knowledge to run time). Your example with \AT for matrix multiplication MN illustrates that. 


However, mathematics is different. We do NOT have to name any specific domain. We can say, the algorithm works for any field k. How do you turn that into computer code without making a choice for k? Or, we can say the algorithm works for any matrix in GL(n, k), for any positive integer n over any field k.


Your answer to the question on overloading is of course the "middle way", but the problem above (unspecified domains in a category, or element in a domain) could cascade and so there has to be a non-specific translation (or way to mark-up), perhaps with a "default" specification in case computation becomes necessary.


Somehow I got (perhaps incorrectly) the impression that your proposed target is weaver(latex paper, axiom paper)---of course, a paper is also a string. 


For a limited application (formulas like integrals), such generality is perhaps not needed. For that purpose, I do not believe we need a new semantic mark-up layer---if I follow your progress correctly, you already have a direct [semi-automatic?] translation program (or a bunch of macros) that inputs the latex source for a formula (or a scanned image with "mathematical OCR" software) and outputs the Axiom code (or better still, an Axiom package that allows [domain] parameters). As you acknowledged, the selatex test file with weaver(latex string, axiom string) does not yet provide the semantic content (that's the semi-automatic part: choosing default domains). Why do we need to "unweave" an axiom string with semantic mark-up back to latex (with or without semantic)? Is it to ensure that weave has an inverse? I don't see that to be the case, since we have to make choices for domains to give full computational semantics but don't for in the latex string, even including full mathematical semantics. I think weave is one to many in general, but unweave can be one to one and thus possibly loses the generality of input latex string given to the weave routine.


William



William Sit
Professor Emeritus
Department of Mathematics
The City College of The City University of New York
New York, NY 10031
homepage: wsit.ccny.cuny.edu

From: Tim Daly <[hidden email]>
Sent: Thursday, August 25, 2016 2:50 PM
To: William Sit
Cc: Dan Zwillinger; Richard Fateman; James Davenport; [hidden email]; Mike Dewar; axiom-dev; [hidden email]
Subject: Re: [Axiom-developer] Design Thoughts on Semantic Latex (SELATEX)
 
William,

It is unlikely that authors will provide a special chunk for Axiom in papers.

Such an ability already exists but I don't expect anyone will adopt it.
The \usepackage{axiom} and \begin{chunk} / \end{chunk} pair work and is
all that is needed.

The primary target of this effort (although not restricted to them) are the
various collections (NIST/CRC/G&R/Jeffrey/etc) of formulas. I use these
references to create the computer algebra test suite but it takes months
to do this by hand. I also use them to build regression tests for Axiom.

There have been various attempts to extract semantics from latex. Some
are quite interesting (see http://mathpix.com). Unfortunately, there isn't
enough information in the latex. For instance, are your formulas given
over the real or complex domain?

In the longer term I am campaigning to bend these tomes toward a
more computational mathematics basis. Instead of listing the names of
20 invariant graph algorithms we really need reference versions of the
algorithms. And we need them in machine-readable form. And we need
them now so a whole generation of computational mathematicians do
not write yet-another-CAS from scratch.

Tim









On Thu, Aug 25, 2016 at 9:13 AM, William Sit <[hidden email]> wrote:

Hi Tim:


Would it be simpler to only add semantic markups to algorithmic descriptions in papers? Authors can be asked to provide a separate chunk with [Axiom] semantic markups (in essence, a skeleton implementation or pseudo-code of the algorithm involved---skeleton because the data structures of mathematical objects are usually ignored in a math paper).  This would avoid having to mess with the latex source (already hard to read sometimes) or to "weave" to remove the semantic markups to recapture the latex: all that is needed would be to ignore the semantic chunk). Put another way, the semantic chunk is a direct (by hand or automatic) translation of the latex version of an algorithm chunk.


Also, what would be the scope of the added semantic macros in LaTeX (like \AD, \INT)? Can their scope be limited only to semantic chunks?


William


William Sit
Professor Emeritus
Department of Mathematics
The City College of The City University of New York
New York, NY 10031

From: Axiom-developer <axiom-developer-bounces+wyscc=[hidden email]> on behalf of Tim Daly <[hidden email]>
Sent: Thursday, August 25, 2016 6:17 AM
To: Dan Zwillinger
Cc: Richard Fateman; James Davenport; [hidden email]; Mike Dewar; axiom-dev; [hidden email]
Subject: Re: [Axiom-developer] Design Thoughts on Semantic Latex (SELATEX)
 
My initial approach was too heavy-handed and Axiom specific.

It seems the semantic markup task can be viewed as an editor
task. Editors don't care about semantics, they just work on text.
Viewed this way the markup's only function is decoration for
post-processing. It is (mostly) system independent.

The edit tasks seem to be {delete, insert, replace} and some
post-markup hints {function, type}

The delete markup tells weaver to remove the latex completely.
This is useful for things like {dx}

The insert markup adds missing semantic text, such as {*} which
is needed to indicate multiplication.

The replace markup gives alternate text for weaver, for things where
the function name might differ, e.g. \int -> integrate

The function markup names a function that weaver should call
which allows special handling in post-processing. It can be any
s-expression (the weaver implementation is currently lisp-based).

The type markup passes type information to weaver so Axiom
knows the target type, useful for things like matrix.

The macros are

\newcommand{\AD}[1]{#1}% delete
\newcommand{\AI}[1]{}% insert
\newcommand{\AR}[2]{#1}% replace

\newcommand{\AF}[2]{#1}% function
\newcommand{\AT}[2]{#1}% type

Note that \AI outputs nothing, so 3\AI{*}x == 3x to latex.

\AD tells weaver to delete the text, e.g. {\AD ~dx} == ~dx to latex

\AR tells weaver to replace the text e.g. {\AR \pi}{\%pi}

\AF tells weaver to call a function, e.g. one that knows how to
rewrite the input in a special way, or does tracing, etc.

\AT adds target type information for Axiom
e.g. \AT{3x+6}{POLY(INT)} == 3x+6 to latex but passes it as
a Polynomial(Integer) to Axiom

For example,

$\int{\frac{dx}{ax*b}}$

becomes

$\AT{\AR{\int}{integrate}{\frac{\AD{dx}}{a\AI{*)x+b}}}{EXPR}{INT}}$

telling weaver that the target type (AT) is EXPR(INT),
the \int is really integrate
the dx is to be ignored and
the ax+b should read a*x+b

There is an obvious tradeoff of markup vs weaver.
For example. \int might be known to weaver.
Or expressions might call an equation rewriter to add {*}

The markup could vary from almost nothing to massive detail
depending on the downstream cleverness.

This initial markup set seems sufficient to handle every task
that requires semantics markup so far. The overhead seems
small and the gain seems large.

Now the only problem is post-processing the latex. Sigh.

There is no such thing as a simple job.

Tim


On Tue, Aug 23, 2016 at 7:27 PM, Tim Daly <[hidden email]> wrote:
For those of you at home wishing to play along, there is a
selatex.test1 file at
http://axiom-developer.org/axiom-website/selatex.test1
containing 620 integrals.

Each line is a call of the form

  weaver(latex-string,axiom-string)

The goal is to transform the latex into Axiom.

Implicit is the idea that weaver will use the selatex tokens
to disambiguate the input. The current file has no selatex
tokens. They will be added as needed. The idea is to keep
the problem simple by adding print-invisible sematics to the
latex-string. In the ideal case the weaver program is trivial,
as is the markup. Any tradeoff should prioritize simplicity.
Another priority is to align the semantic markup with
Axiom domains in order to ground the semantics with code.

Once all of these calls translate correctly the Axiom output
routines need to output the latex-string with the added
semantic markup so the mapping is bi-directional.

The current file only looks at integration as I already have
the latex->axiom text available. Future test files will look
at other areas of interest. The long term goal is to parse
NIST/CRC/etc formulas.

Tim


On Sun, Aug 21, 2016 at 11:02 PM, Tim Daly <[hidden email]> wrote:
Dan,

While paging through the CRC 31st Standard Mathematical
Tables I landed on page 219, section 3.4.1.2 Graph Invariants.

It would be a vast improvement if there were algorithms
associated with these invariants. Clearly they exist somewhere.

To "cross the gap" between tables and computational mathematics
it would be valuable to include implementations of these invariants.

It is hard to walk away from that page. An Axiom implementation
would be fun to write, especially given the next section that lists
different kinds of graphs which, presumably, would all have the
invariants. Even better, the graph algorithms are likely good
candidates for proof technology (ACL2 if done in Lisp, COQ if
done in Spad). Lisp has the advantage of an ANSI standard.

It seems worthwhile to take sections like this, expand them
across computational and proof tools, and publish them in a
form that is generally useful. It is "nice to know" that a graph
has a radius but it would be even better if I could "just point and
click" to import the algorithm.

Axiom has been pushing literate programming for years. The
tools exist to "make it so", as the saying goes.

Tim




On Sun, Aug 21, 2016 at 10:40 PM, Tim Daly <[hidden email]> wrote:
Like any research problem it is a struggle to get a useful grip on this.
Looking at G&R (I just ordered the latest, mine is 4th edition), the
task quickly gets out of hand.

The CATS tests in the past were created by reading the printed latex
in various volumes and hand-translating them to Axiom input.

It is not difficult to re-create the latex input for these examples.
Doing so and combining the results gives a set of examples with
matching input latex and output Axiom. The homework problem is
to write the necessary markup and weaver.

It is immediately obvious that this is more challenging than it seems.
For example, when writing y'(x)=0, Axiom needs y:=operator 'y
so it knows about the symbol as an operator. This falls under
"Consideration 12: System Specific Commands"... which implies
that the latex environment and quoting macros have to be
implemented. Sigh.

There is no such thing as a simple job.

Anyway, at least there is a way to make a proof of concept
prototype that reproduces existing results.

Tim


On Sun, Aug 21, 2016 at 4:17 PM, Tim Daly <[hidden email]> wrote:
Dan,

Welcome.

Re: Howard Cohl. Yes, I'd like an introduction. It seems important to
make DLMF, CRC, and other sources contain enough semantics that
they can be read by a computer algebra system. There are an
enormous number of issues, such as what to do with functions
unknown to the CAS, which need to be thought through.

I believe that NIST/CRC/G&R collections with semantic markup will
have a great normalizing effect on CAS development since it will raise
cross-platform questions like "What percent of G&R do you handle?".
Albert Rich (RUBI)[0] has been doing this for integration using patterns.
This can only benefit computational mathematics in the long term.

I've also campaigned for associating algorithms with published tables.
It is important in the long term to have reference versions. The ACM
used to do this years ago. I'd like to see a Gruntz algorithm for limits
where it can be applied, for instance. It would also provide a focus on
"missing algorithms" or edge cases. Davenport/Trager/Bronstein
algorithms promise a decision procedure but there are no existing
complete implementations. The tables could highlight missing cases,
giving focus to efforts to complete the procedure.

It will also put back-pressure on the tables to define different versions
of the same formulas based on domains (C, R, etc).


"The GR work was more than I had anticipated"... wins the award for
understatement of the decade.

The goal of this effort is to make it possible to read those
formulas directly into a CAS. Axiom is my primary target but
it should be done in a somewhat system agnostic form.

I've spent well over a year creating the computer algebra test suite.
It would be so much easier and more useful if the original sources
could be read directly.

I read your paper. There is an interesting mix of syntax and semantics.

I guess the difference in this effort is that the semantic markup is
intended to be transparent and grounded. The transparent aspect
keeps the tables unchanged in print form. The grounded aspect keeps
the semantics based on some algorithmic base. This implies, of
course, that there IS an algorithmic base which does not yet exist
for some of the work.

As you can see from the CATS work I've been trying to validate
Axiom's results with respect to published texts. This has found both
Axiom bugs and misprints in published texts.

The Kamke[1] suite was the first effort for differential equations.

The Spiegel[2] chapter 14 on indefinite integrals for integration.

The von Seggern[3] book on curves and surfaces for graphics.

The Legendre and Grazini[4] on Pasta by Design for 3D graphics.

The RUBI work on integration.

and, currently I'm re-creating the numerics that were lost when NAG
released the open source version, leaving me swimming through
Luke's[5] Algorithms book.

which, to quote a famous phrase "was more than I had anticipated".

Your Handbook of Integration[6] has a section on various known
"Caveats, How an integration result may be incorrect". This raises the
wonderful topic of branch cuts yet again. I did some testing and it
seems that Axiom and Mathematica share one set while Maple and
Maxima share another.

All of which leads to a need to create better reference materials that
are generally available (unlike the ACM algorithms for non-paying
customers) and directly useful for computational mathematics.

The current plan is to take some tables, find or re-create the latex,
invent a semantic markup, and then write the "weaver". At this point
the research is still at the "proof of concept" stage. (tex formula
sources are most welcome).

Ultimately I'd really like to see a book of formulas and algorithms
that I can just drag-and-drop into Axiom and be able to use them
without lifetimes of work.

Actually, that 's only the penultimate goal. I have augmented
Axiom to include proofs (ACL2,COQ) so I'd also like to see proofs,
(this IS mathematics, after all) but maybe we'll leave that for
next month :-)

Tim

[0] Rich, Albert "Rule-based Mathematics"
http://www.apmaths.uwo.ca/~arich/

[1] Kamke. E. "Differentialgleichungen Losungsmethoden und Losungen"
Chelsea Publishing Company, New York, 1959

[2] Spiegel, Murray R. "Mathematical Handbook", Schaum's Outline
Series; McGraw-Hill Book Company 1968

[3] von Seggern, David "CRC Standard Curves and Surfaces",
CRC Press, 1993 ISBN 0-8493-0196-3

[4] Legendre, George L. and Grazini, Stefano "Pasta by Design",
Thames and Hudson, 2001

[5] Luke, Yudell "Algorithms for the Computation of Mathematical
Functions", Academic Press, 1977 ISBN 0-12-459940-6

[6] Zwillinger, Daniel "Handbook of Integration" Jones and Bartlett,
London, 1992
 


On Sun, Aug 21, 2016 at 10:16 AM, Dan Zwillinger <[hidden email]> wrote:

All

I began reading this topic's emails when they first appeared, and then fell behind.
Sorry for my late comments.

I admire your efforts.
They may be somewhat related to what I have done in the past.
My experience is as follows:

(1) CRC SMTF (Standard Mathematical Tables and Formula)

I put the ~700 integrals in CRC's SMTF into a format from which
(A) they could be typeset in LaTeX
(B) they could be converted into Mathematica (which either did a symbolic or numeric computation) - and this was done

I let Richard Fateman use them for his experiments.


(2) Elsevier's GR (Gradshteyn and Ryzhik's "Table of Integrals, Series, and Products")

I put the ~12,000 (if my memory is correct) integrals into a format from which
(A) they could be beautifully typeset in LaTeX
(B) they could be converted into Mathematica - and this was NOT done
Enclosed is a PDF file describing the work and the resulting format.


A different format was used for SMTF and GR.
While the SMTF work was not too arduous, the GR work was more than I had anticipated.
The input (the previous version of GR) had little syntactic structure and it took much effort to get it into shape.
I used (many different) regular expressions (in perl) to translate the bulk of the book, and then lots of hand tuning.

While I think you are well beyond my thinking on these topics, please let me know if I can help.
I am friends with Howard Cohl at NIST, who may be the current lead for DLMF ("Digital Library of Mathematical Functions" at dlmf.nist.gov).
Let me know if you need an introduction.


Dan Zwillinger
[hidden email]
<a href="tel:617-388-2382" value="&#43;16173882382" target="_blank">617-388-2382
On 8/20/2016 11:30 PM, Tim Daly wrote:
The game is to define latex markup that is transparent to the syntax
but adds semantics for post processing.

Some simple tests show that this works. Suppose selatex.sty contains:

\newcommand{\INT}[1]{#1}
\newcommand{\VARIABLE}[1]{#1}
\newcommand{\POLY}[1]{#1}
\newcommand{\INTEG}[2]{\int{#1}}

This defines 4 new latex markups. The number in the square brackets
defines the number of expected arguments. The brace argument
delimites the characters that will occur during expansion with the #1
replaced by the first argument.

(As an aside, INT, VARIABLE, and POLY just happen to be valid
Axiom domain abbreviations, hence the name choice. This choice
of names gives grounding to the semantics.)

Notice that \INTEG takes two arguments but will display only one.
This allows the variable of integration to be passed in the semantics
without showing up in the output. This allows the semantics to carry
additional, non-display information needed by the CAS.

Some examples follow.

An integer 3 can be wrapped as \INT{3} but will still display as 3.

A variable x can be wrapped as \VARIABLE{x}, displayed as x.

$\POLY{\INT{3}\VARIABLE{x}}$ will display as 3*x

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x} will be the same result
as $\int{3x~dx}$, that is, an
  (integralsign) 3x dx
but notice that the variable of integration is in the semantic markup.

These trivial macros can be made less verbose and certainly
more clever but that's not the point being made here.

A 'weaver' program would see the integration expression as

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x}$

with all of the semantic tags. The weaver's job is to rewrite this
expression into an inputform for the CAS. In Axiom that would be

integrate(3*x,x)

The semantics markup makes the display pretty and the semantic
parsing possible. Depending on the system, more or less parsing
markup can exist. Axiom, for example, would not need the \INT or
\VARIABLE to get a correct parse so the expression could be
$\INTEG{3x~dx}{x}$

This validates the fundamental idea.

The next step is to write a simple weaver program. The clever path
would be to embed a declarative form of the parser syntax (BNF?)
as comments in selatex.sty. That way the latex semantics and the
weaver syntax are kept in sync.

Weaver would read the BNF comments from selatex.sty and
the formula with semantic markup as input and parse the semantic
markup into inputforms. (Wish I thought of this homework problem
when I taught the compiler course :-) ).

Note that, depending on the BNF, weaver could be used to
generate output for Maxima's tree-based representation.

An alternative next step is to look at a CRC book, re-create the
syntactic latex and then create the selatex.sty entries necessary
to generate weaver input.

Infinitesimal progress, but progress non-the-less.

Tim


On Fri, Aug 19, 2016 at 12:45 AM, Tim Daly <[hidden email]> wrote:

One of the Axiom project goals is to develop a "Computer Algebra Test
Suite" (CATS). Albert Rich has done this with RUBI and integration. That
work is already partially in the test suite and work has been done on the
pattern matching. Large datasets (like Kamke) are always welcome. Some,
such as Schaums were hand-developed. This is tedious.

As Axiom develops more explanations and documentation it would be
useful to execute the formulas directly so there is a local incentive to be
clear about semantics.

In the long term the hope is that we can just grab formulas directly from
their sources (ala literate programming). Your work makes it plain that raw
latex does not carry sufficient semantics. There is a global question of
how to make this work. Unfortunately a general cross-platform solution
is difficult (cite Dewar/Davenport/et al. for OpenMath).

Since Axiom is literate and extracting formulas is trivial it seems that
literate markup is a natural goal. Since Axiom uses abstract algebra
as a scaffold the type tower already has a lot of axiomatic semantics.
The natural join of literate latex and abstract algebra is clearly
semantic markup, aka selatex.

===========
Consideration 10: semantic->inputform translation (weaver? :-) )

>x and not x   has no particular meaning,  but if x is explicitly true or false,
>Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
>defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
>semantics, which you presumably know about and can modify.

I am NOT defining another CAS. The goal is a "well-designed hack" using
universally understood latex, a latex package, and a translation program.

The selatex idea is only partially Axiom specific. \INT, for instance, seems
pretty generic. However, if the idea is to read formulas and disambiguate
a=b (boolean) vs a=b (equation) then the markup needs to be grounded
to have meaning. Axiom's domains (BOOLEAN) and (EQ) as the ground

\BOOLEAN(a=b)
\EQ(a=b)

are unambiguous relative to each other in Axiom. I don't know enough
about Maxima to understand how this might translate.

The extracted formulas with the decorated semantics still needs a
semantics->inputform (weaver) pre-processor which could be Maxima
specific. This would lead to debate about what "equality" means, of course.

Axiom has tried to create a first-order "rosetta stone" to translate between
systems (rosetta.pdf [1]) but it is too shallow to consider providing
cross-platform semantics.

=============
Consideration 11: \scope in selatex

>As far as recording stuff in DLMF -- there are presumably scope issues
>("in this chapter n,m are natural numbers....")  and maybe even a need
>to make value assignments. 
>I think you need to model these in SEALATEX too.

(See Consideration 6)

Clearly there are scoping issues. My current thinking is to create a
\scope markup that would manage the environment(s). This is not
a new issue (see "Lisp in Small Pieces" [0])

There seem to be three concerns.

First is the scope name, with something like 'global' as a keyword.
Second is the "closure chain" of other scopes.
Third is the symbol being scoped.

\scope{name}{chain}{symbol}

The weaver program would walk this chain to create the proper
file syntax for system input.

============
Consideration 12: System specific commands \axiom

Along with the formulas it is clear that some system specific
input may be required, such as loading files, clearing workspaces,
etc. Some of these may be done in the weaver program, such as
between formulas. Others may need to be added to the semantics
block. So a markup that provides verbatim quoting per system
might be defined, e.g.

\axiom{)clear all}  %clear the workspace

which would simply quote an input line.

==============

Note that so far all that is being suggested is transparent formula
markups which do not impact the presentation, some special tags
(\scope, \axiom,...) and a weaver program, along with the ability to
read the latex and extract named formulas (aka a literate program,
which Axiom already can do).

It ought to be possible (by design) to create a semantic version of
CRC that any system could import, assuming a "sufficiently clever
weaver".

On a more ambitious note, I am trying to find a way to keep the selatex
markup "hidden" in a pdf and use it as the clipboard paste when the
formula is selected. Anyone with a clue, please help.

===============

[0] Queinnec, Christopher, "Lisp in Small Pieces" ISBN 978-0521545662
(2003)

[1] Wester, Michael J. and Daly, TImothy "Rosetta"



On Thu, Aug 18, 2016 at 5:30 PM, Richard Fateman <[hidden email]> wrote:
thanks for all the references :)

I'm not sure if I'm going to repeat comments I  made already somewhere.
First,  has Dan Zwillinger weighed in?  I think that it would be useful
to see what he has done.

Next, there are ambiguities among CAS and even within a single CAS.

For example, in Macsyma/ Maxima  there is generally no semantics
associated with "=" or ">".   But in some contexts, there is some meaning.

x>2*y

is a tree expression.  It is not associated with x or with y.

assume(x>2*y)   does mean something ... it puts info in a database.
Somehow encoding the method to extract this information into SEALATEX
(SeLaTeX?) in a CAS-independent way -- that's quite a task.   In
particular, it would seem to require an understanding of what assume()
does in Maxima, and what is() does also.

x and not x   has no particular meaning,  but if x is explicitly true or false,
Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
semantics, which you presumably know about and can modify.

As far as recording stuff in DLMF -- there are presumably scope issues
("in this chapter n,m are natural numbers....")  and maybe even a need
to make value assignments. 
I think you need to model these in SEALATEX too.

Just musing about where you are heading.

RJF




On 8/18/2016 11:45 AM, Tim Daly wrote:
Fateman [0] raised a set of issues with the OpenMath
approach. We are not trying to be cross-platform in this
effort. Axiom does provide an algebraic scaffold so it is
possible that the selatex markup might be useful elsewhere
but that is not a design criterion.

Fateman[1] also raises some difficult cross-platform issues
that are not part of this design.

Fateman[2] shows that parsing tex with only syntactic markup
succeeded on only 43% of 10740 inputs. It ought to be posible
to increase this percentage given proper semantic markup.
(Perhaps there should be a competition similar to the deep
learning groups? PhDs have been awarded on incremental
improvements of the percentage)

This is a design-by-crawl approach to the semantic markup
idea. The hope is to get something running this week that
'works' but giving due consideration to global and long-term
issues. A first glance at CRC/NIST raises more questions
than answers as is usual with any research.

It IS a design goal to support a Computer Algebra Test Suite
tedious to hand construct test suites. It will be even more
tedious to construct them "second-level" by doing semantic
markup and then trying to use them as input, but the hope is
that eventually the CRC/NIST/G&R, etc will eventually be
published with semantics so computational mathematics can
stop working from syntax.

===========
Consideration 4: I/O transparency

Assume for the moment that we take a latex file containing
only formulas. We would like to be able to read this file so
it has computational mathematics (CM) semantics.

It is clear that there needs to be semantic tags that carry the
information but these tags have to be carefully designed NOT
to change the syntactic display. They may, as noted before,
require multiple semantic versions for a single syntax.

It is also clear that we would like to be able to output formulas
with CM semantics where currently we only output syntax.

===========
Consideration 5: I/O isomorphism

An important property of selatex is an isomorphism with
input/output. Axiom allows output forms to be defined for a
variety of targets so this does not seem to be a problem. For
input, however, this means that the reader has to know how
to expand \INT{3} into the correct domain. This could be done
with a stand-alone pre-processor from selatex->inputform.

It should be possible to read-then-write an selatex formula,
or write-then-read an selatex formula with identical semantics.

That might not mean that the I/O is identical though due to
things like variable ordering, etc.

===========
Consideration 6: Latex semantic macros

Semantic markup would be greatly simplified if selatex provided
a mechanism similar to Axiom's ability to define types "on the fly"
using either assignment

   TYP:=FRAC(POLY(INT))

or macro form

   TYP ==> FRAC(POLY(INT))

Latex is capable of doing this and selatex should probably include
a set of pre-defined common markups, such as

  \FRINT ==> \FRAC\INT

===========
Consideration 7: selatex \begin{semantic} environment?

Currently Axiom provides a 'chunk' environment which surrounds
source code. The chunks are named so they can be extracted
individually or in groups

   \begin{chunk}{a name for the chunk}
      anything
   \end{chunk}

We could provide a similar environment for semantics such as

  \begin{semantics}{a name for the block}
  \end{semantics}

which would provide a way to encapsulate markup and also allow
a particular block to be extracted in literate programming style.

===========
Consideration 8: Latex-time processing

Axiom currently creates specific files using \write to create
intermediate files (e.g. for tables). This technique can be used
to enhance latex-time debugging (where did it fail?).

It can be used to create Axiom files which pre-construct domains
needed when the input file with semantic markup is read.

This would help a stand-alone selatex->inputform preprocessor.

===========
Consideration 9: Design sketches

It is all well-and-good to hand-wave at this idea but a large
amount of this machinery already exists.

It would seem useful to develop an incremental test suite that
starts with "primitive" domains (e.g. INT), creating selatex I/O.

Once these are in place we could work on "type tower" markup
such as \FRAC\INT or \POLY\COMPLEX\FLOAT.

Following that might be pre-existing latex functions like \int, \sum,
\cos, etc.

To validate these ideas Axiom will include an selatex.sty file and
some unit tests files on primitive domain markup. That should be
enough to start the bikeshed discussions.

Ideas? Considerations? Suggestions?

Tim

[0] Fateman, Richard J.
"A Critique of OpenMath and Thoughts on
Encoding Mathematics, January, 2001"
https://people.eecs.berkeley.edu/~fateman/papers/openmathcrit.pdf

[1] Fateman, Richard J.
"Verbs, Nouns, and Computer Algebra, or What's Grammar Got to
[2] Fateman, Richard J.











_______________________________________________
Axiom-developer mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/axiom-developer
Reply | Threaded
Open this post in threaded view
|

Re: Design Thoughts on Semantic Latex (SELATEX)

Tim Daly

Extracting Mathematical Semantics from LATEX Documents

Jurgen Stuber and Mark van den Brand,


Abstract. We report on a project to use SGLR parsing and term rewriting with ELAN4 to extract the semantics of

mathematical formulas from a LATEX document and representing them in MathML. The LATEX document we used 

is part of the Digital Library of Mathematical Functions (DLMF) project of the US National Institute of Standards 

and Technology (NIST) and obeys project-specific conventions, which contains macros for mathematical

constructions, among them 200 predefined macros for special functions, the subject matter of the project. The

SGLR parser can parse general context-free languages, which suffices to extract the structure of mathematical

formulas from calculus that are written in the usual mathematical style, with most parentheses and multiplication

signs omitted. The parse tree is then rewritten into a more concise and uniform internal syntax that is used as the

base for extracting MathML or other semantical information.



On Thu, Aug 25, 2016 at 8:05 PM, Tim Daly <[hidden email]> wrote:
Hmmmm. I'm not communicating the idea clearly...

=======
>However, mathematics is different. We do NOT have to name any
>specific domain. We can say, the algorithm works for any field k.
>How do you turn that into computer code without making a choice
>for k? Or, we can say the algorithm works for any matrix in GL(n, k),
>for any positive integer n over any field k.

Recall that the focus is (currently) on importing formulas from CRC.

At import time the value can be symbolic but at evaluation time the
field needs to be specific. Simplifications available over C might not
be available over R. The integral may be valid over both fields but the
result of the integration might be different depending on that choice.

If the formula is valid for any field there is no need to specify the \AT
type. Often, though, the type is implicit. For example, in quantum
mechanics the matrix has to be 2x2 and unitary. This implicit type
might be known to the quantum mechanics domain functions so it
might not need markup but general linear algebra functions might
need to know.

Also, there is no need to specify the field. If k can be any field, the
markup could just say 'Field", using the Axiom category name. Why
is this a problem?


=======
>Somehow I got (perhaps incorrectly) the impression that your
>proposed target is weaver(latex paper, axiom paper)---of course,
>a paper is also a string.

The proposed target is a CRC formula. There are 10,000 of them on my
shelf and not enough lifetime to retype them. The goal is to make the
mathematics available to computational mathematics programs.
How can I drag-and-drop a formula?

CRC and NIST have their own target audience, of course, and it is
not (currently) computational mathematics programs. I expect this to
change in the future as the need arises (and I need it now).


=======
>For a limited application (formulas like integrals), such generality is
>perhaps not needed. For that purpose, I do not believe we need a
>new semantic mark-up layer---if I follow your progress correctly, you
>already have a direct [semi-automatic?] translation program (or a
>bunch of macros) that inputs the latex source for a formula (or a
>scanned image with "mathematical OCR" software) and outputs
>the Axiom code (or better still, an Axiom package that allows
>[domain] parameters).

Umm, no. Fateman (see prior emails) shows that Latex does not have
sufficient semantic information. The OpenMath effort has tried to do
cross-platform communication and latex will not support communicating
the semantics. Problems, such as choice of branch cuts, result in
different answers even for fundamentals like log. So, no, there is no
existing translation program. In fact, there can't be one based solely
on existing latex markup.

This has not been a problem in the past. Humans bring a lot of semantics
to their search of CRC. But a computational mathematics program does
not. It needs more explicit markup.


=======
>As you acknowledged, the selatex test file with
>weaver(latex string, axiom string) does not yet provide the semantic
>content (that's the semi-automatic part: choosing default domains).

The test file is intended to drive progress in this problem. The idea
is that the latex string needs minimal markup to generate the axiom
string... what is that markup? The test file will be changed to make
that clear.

Choosing domains is only one semantic problem. The MN latex
example shows that the syntax is ambiguous. In general, in any
textbook the semantics is in the surrounding paragraphs.

Take any textbook with formulas (e.g. physics), extract only
the formulas, and a LOT of information is lost. in $E=mc^2$,
is c a constant? Can you tell from the formula alone?


========
>Why do we need to "unweave" an axiom string with semantic
>mark-up back to latex (with or without semantic)? Is it to ensure
>that weave has an inverse? I don't see that to be the case, since
>we have to make choices for domains to give full computational
>semantics but don't for in the latex string, even including full
>mathematical semantics. I think weave is one to many in general,
>but unweave can be one to one and thus possibly loses the
>generality of input latex string given to the weave routine.

Axiom currently outputs formula latex without semantics.

Suppose that you wanted to create the CRC tables using Axiom
to generate the formulas in a systematic way. Wouldn't it be
convenient if the semantic markup was automatically inserted
rather than having to do that by hand? Should Axiom say what
it knows?

As you note, the Axiom-generated formula might be true for any
field, not just the current one. That's pretty easy to handle. You
just call a routine that either generalizes the output to be "Field"
or one that elides the \AT type markup completely.

We don't NEED "unweave" or Axiom semantic latex output.
The mathematician in me wants it.

Tim
 

On Thu, Aug 25, 2016 at 5:59 PM, William Sit <[hidden email]> wrote:

Tim:


Thanks for taking the time to address my ignorant questions.

You wrote: "For instance, are your formulas given over the real or complex domain?​"


This question is of course relevant in computation. Even in Axiom, which allows domains (belonging to a specific Axiom category) as parameters in function or package calls, the compiler needs to know the exact domain at compile time (and with some more effort, delay this knowledge to run time). Your example with \AT for matrix multiplication MN illustrates that. 


However, mathematics is different. We do NOT have to name any specific domain. We can say, the algorithm works for any field k. How do you turn that into computer code without making a choice for k? Or, we can say the algorithm works for any matrix in GL(n, k), for any positive integer n over any field k.


Your answer to the question on overloading is of course the "middle way", but the problem above (unspecified domains in a category, or element in a domain) could cascade and so there has to be a non-specific translation (or way to mark-up), perhaps with a "default" specification in case computation becomes necessary.


Somehow I got (perhaps incorrectly) the impression that your proposed target is weaver(latex paper, axiom paper)---of course, a paper is also a string. 


For a limited application (formulas like integrals), such generality is perhaps not needed. For that purpose, I do not believe we need a new semantic mark-up layer---if I follow your progress correctly, you already have a direct [semi-automatic?] translation program (or a bunch of macros) that inputs the latex source for a formula (or a scanned image with "mathematical OCR" software) and outputs the Axiom code (or better still, an Axiom package that allows [domain] parameters). As you acknowledged, the selatex test file with weaver(latex string, axiom string) does not yet provide the semantic content (that's the semi-automatic part: choosing default domains). Why do we need to "unweave" an axiom string with semantic mark-up back to latex (with or without semantic)? Is it to ensure that weave has an inverse? I don't see that to be the case, since we have to make choices for domains to give full computational semantics but don't for in the latex string, even including full mathematical semantics. I think weave is one to many in general, but unweave can be one to one and thus possibly loses the generality of input latex string given to the weave routine.


William



William Sit
Professor Emeritus
Department of Mathematics
The City College of The City University of New York
New York, NY 10031

From: Tim Daly <[hidden email]>
Sent: Thursday, August 25, 2016 2:50 PM
To: William Sit
Cc: Dan Zwillinger; Richard Fateman; James Davenport; [hidden email]; Mike Dewar; axiom-dev; [hidden email]

Subject: Re: [Axiom-developer] Design Thoughts on Semantic Latex (SELATEX)
 
William,

It is unlikely that authors will provide a special chunk for Axiom in papers.

Such an ability already exists but I don't expect anyone will adopt it.
The \usepackage{axiom} and \begin{chunk} / \end{chunk} pair work and is
all that is needed.

The primary target of this effort (although not restricted to them) are the
various collections (NIST/CRC/G&R/Jeffrey/etc) of formulas. I use these
references to create the computer algebra test suite but it takes months
to do this by hand. I also use them to build regression tests for Axiom.

There have been various attempts to extract semantics from latex. Some
are quite interesting (see http://mathpix.com). Unfortunately, there isn't
enough information in the latex. For instance, are your formulas given
over the real or complex domain?

In the longer term I am campaigning to bend these tomes toward a
more computational mathematics basis. Instead of listing the names of
20 invariant graph algorithms we really need reference versions of the
algorithms. And we need them in machine-readable form. And we need
them now so a whole generation of computational mathematicians do
not write yet-another-CAS from scratch.

Tim









On Thu, Aug 25, 2016 at 9:13 AM, William Sit <[hidden email]> wrote:

Hi Tim:


Would it be simpler to only add semantic markups to algorithmic descriptions in papers? Authors can be asked to provide a separate chunk with [Axiom] semantic markups (in essence, a skeleton implementation or pseudo-code of the algorithm involved---skeleton because the data structures of mathematical objects are usually ignored in a math paper).  This would avoid having to mess with the latex source (already hard to read sometimes) or to "weave" to remove the semantic markups to recapture the latex: all that is needed would be to ignore the semantic chunk). Put another way, the semantic chunk is a direct (by hand or automatic) translation of the latex version of an algorithm chunk.


Also, what would be the scope of the added semantic macros in LaTeX (like \AD, \INT)? Can their scope be limited only to semantic chunks?


William


William Sit
Professor Emeritus
Department of Mathematics
The City College of The City University of New York
New York, NY 10031

From: Axiom-developer <axiom-developer-bounces+wyscc=[hidden email]> on behalf of Tim Daly <[hidden email]>
Sent: Thursday, August 25, 2016 6:17 AM
To: Dan Zwillinger
Cc: Richard Fateman; James Davenport; [hidden email]; Mike Dewar; axiom-dev; [hidden email]
Subject: Re: [Axiom-developer] Design Thoughts on Semantic Latex (SELATEX)
 
My initial approach was too heavy-handed and Axiom specific.

It seems the semantic markup task can be viewed as an editor
task. Editors don't care about semantics, they just work on text.
Viewed this way the markup's only function is decoration for
post-processing. It is (mostly) system independent.

The edit tasks seem to be {delete, insert, replace} and some
post-markup hints {function, type}

The delete markup tells weaver to remove the latex completely.
This is useful for things like {dx}

The insert markup adds missing semantic text, such as {*} which
is needed to indicate multiplication.

The replace markup gives alternate text for weaver, for things where
the function name might differ, e.g. \int -> integrate

The function markup names a function that weaver should call
which allows special handling in post-processing. It can be any
s-expression (the weaver implementation is currently lisp-based).

The type markup passes type information to weaver so Axiom
knows the target type, useful for things like matrix.

The macros are

\newcommand{\AD}[1]{#1}% delete
\newcommand{\AI}[1]{}% insert
\newcommand{\AR}[2]{#1}% replace

\newcommand{\AF}[2]{#1}% function
\newcommand{\AT}[2]{#1}% type

Note that \AI outputs nothing, so 3\AI{*}x == 3x to latex.

\AD tells weaver to delete the text, e.g. {\AD ~dx} == ~dx to latex

\AR tells weaver to replace the text e.g. {\AR \pi}{\%pi}

\AF tells weaver to call a function, e.g. one that knows how to
rewrite the input in a special way, or does tracing, etc.

\AT adds target type information for Axiom
e.g. \AT{3x+6}{POLY(INT)} == 3x+6 to latex but passes it as
a Polynomial(Integer) to Axiom

For example,

$\int{\frac{dx}{ax*b}}$

becomes

$\AT{\AR{\int}{integrate}{\frac{\AD{dx}}{a\AI{*)x+b}}}{EXPR}{INT}}$

telling weaver that the target type (AT) is EXPR(INT),
the \int is really integrate
the dx is to be ignored and
the ax+b should read a*x+b

There is an obvious tradeoff of markup vs weaver.
For example. \int might be known to weaver.
Or expressions might call an equation rewriter to add {*}

The markup could vary from almost nothing to massive detail
depending on the downstream cleverness.

This initial markup set seems sufficient to handle every task
that requires semantics markup so far. The overhead seems
small and the gain seems large.

Now the only problem is post-processing the latex. Sigh.

There is no such thing as a simple job.

Tim


On Tue, Aug 23, 2016 at 7:27 PM, Tim Daly <[hidden email]> wrote:
For those of you at home wishing to play along, there is a
selatex.test1 file at
http://axiom-developer.org/axiom-website/selatex.test1
containing 620 integrals.

Each line is a call of the form

  weaver(latex-string,axiom-string)

The goal is to transform the latex into Axiom.

Implicit is the idea that weaver will use the selatex tokens
to disambiguate the input. The current file has no selatex
tokens. They will be added as needed. The idea is to keep
the problem simple by adding print-invisible sematics to the
latex-string. In the ideal case the weaver program is trivial,
as is the markup. Any tradeoff should prioritize simplicity.
Another priority is to align the semantic markup with
Axiom domains in order to ground the semantics with code.

Once all of these calls translate correctly the Axiom output
routines need to output the latex-string with the added
semantic markup so the mapping is bi-directional.

The current file only looks at integration as I already have
the latex->axiom text available. Future test files will look
at other areas of interest. The long term goal is to parse
NIST/CRC/etc formulas.

Tim


On Sun, Aug 21, 2016 at 11:02 PM, Tim Daly <[hidden email]> wrote:
Dan,

While paging through the CRC 31st Standard Mathematical
Tables I landed on page 219, section 3.4.1.2 Graph Invariants.

It would be a vast improvement if there were algorithms
associated with these invariants. Clearly they exist somewhere.

To "cross the gap" between tables and computational mathematics
it would be valuable to include implementations of these invariants.

It is hard to walk away from that page. An Axiom implementation
would be fun to write, especially given the next section that lists
different kinds of graphs which, presumably, would all have the
invariants. Even better, the graph algorithms are likely good
candidates for proof technology (ACL2 if done in Lisp, COQ if
done in Spad). Lisp has the advantage of an ANSI standard.

It seems worthwhile to take sections like this, expand them
across computational and proof tools, and publish them in a
form that is generally useful. It is "nice to know" that a graph
has a radius but it would be even better if I could "just point and
click" to import the algorithm.

Axiom has been pushing literate programming for years. The
tools exist to "make it so", as the saying goes.

Tim




On Sun, Aug 21, 2016 at 10:40 PM, Tim Daly <[hidden email]> wrote:
Like any research problem it is a struggle to get a useful grip on this.
Looking at G&R (I just ordered the latest, mine is 4th edition), the
task quickly gets out of hand.

The CATS tests in the past were created by reading the printed latex
in various volumes and hand-translating them to Axiom input.

It is not difficult to re-create the latex input for these examples.
Doing so and combining the results gives a set of examples with
matching input latex and output Axiom. The homework problem is
to write the necessary markup and weaver.

It is immediately obvious that this is more challenging than it seems.
For example, when writing y'(x)=0, Axiom needs y:=operator 'y
so it knows about the symbol as an operator. This falls under
"Consideration 12: System Specific Commands"... which implies
that the latex environment and quoting macros have to be
implemented. Sigh.

There is no such thing as a simple job.

Anyway, at least there is a way to make a proof of concept
prototype that reproduces existing results.

Tim


On Sun, Aug 21, 2016 at 4:17 PM, Tim Daly <[hidden email]> wrote:
Dan,

Welcome.

Re: Howard Cohl. Yes, I'd like an introduction. It seems important to
make DLMF, CRC, and other sources contain enough semantics that
they can be read by a computer algebra system. There are an
enormous number of issues, such as what to do with functions
unknown to the CAS, which need to be thought through.

I believe that NIST/CRC/G&R collections with semantic markup will
have a great normalizing effect on CAS development since it will raise
cross-platform questions like "What percent of G&R do you handle?".
Albert Rich (RUBI)[0] has been doing this for integration using patterns.
This can only benefit computational mathematics in the long term.

I've also campaigned for associating algorithms with published tables.
It is important in the long term to have reference versions. The ACM
used to do this years ago. I'd like to see a Gruntz algorithm for limits
where it can be applied, for instance. It would also provide a focus on
"missing algorithms" or edge cases. Davenport/Trager/Bronstein
algorithms promise a decision procedure but there are no existing
complete implementations. The tables could highlight missing cases,
giving focus to efforts to complete the procedure.

It will also put back-pressure on the tables to define different versions
of the same formulas based on domains (C, R, etc).


"The GR work was more than I had anticipated"... wins the award for
understatement of the decade.

The goal of this effort is to make it possible to read those
formulas directly into a CAS. Axiom is my primary target but
it should be done in a somewhat system agnostic form.

I've spent well over a year creating the computer algebra test suite.
It would be so much easier and more useful if the original sources
could be read directly.

I read your paper. There is an interesting mix of syntax and semantics.

I guess the difference in this effort is that the semantic markup is
intended to be transparent and grounded. The transparent aspect
keeps the tables unchanged in print form. The grounded aspect keeps
the semantics based on some algorithmic base. This implies, of
course, that there IS an algorithmic base which does not yet exist
for some of the work.

As you can see from the CATS work I've been trying to validate
Axiom's results with respect to published texts. This has found both
Axiom bugs and misprints in published texts.

The Kamke[1] suite was the first effort for differential equations.

The Spiegel[2] chapter 14 on indefinite integrals for integration.

The von Seggern[3] book on curves and surfaces for graphics.

The Legendre and Grazini[4] on Pasta by Design for 3D graphics.

The RUBI work on integration.

and, currently I'm re-creating the numerics that were lost when NAG
released the open source version, leaving me swimming through
Luke's[5] Algorithms book.

which, to quote a famous phrase "was more than I had anticipated".

Your Handbook of Integration[6] has a section on various known
"Caveats, How an integration result may be incorrect". This raises the
wonderful topic of branch cuts yet again. I did some testing and it
seems that Axiom and Mathematica share one set while Maple and
Maxima share another.

All of which leads to a need to create better reference materials that
are generally available (unlike the ACM algorithms for non-paying
customers) and directly useful for computational mathematics.

The current plan is to take some tables, find or re-create the latex,
invent a semantic markup, and then write the "weaver". At this point
the research is still at the "proof of concept" stage. (tex formula
sources are most welcome).

Ultimately I'd really like to see a book of formulas and algorithms
that I can just drag-and-drop into Axiom and be able to use them
without lifetimes of work.

Actually, that 's only the penultimate goal. I have augmented
Axiom to include proofs (ACL2,COQ) so I'd also like to see proofs,
(this IS mathematics, after all) but maybe we'll leave that for
next month :-)

Tim

[0] Rich, Albert "Rule-based Mathematics"
http://www.apmaths.uwo.ca/~arich/

[1] Kamke. E. "Differentialgleichungen Losungsmethoden und Losungen"
Chelsea Publishing Company, New York, 1959

[2] Spiegel, Murray R. "Mathematical Handbook", Schaum's Outline
Series; McGraw-Hill Book Company 1968

[3] von Seggern, David "CRC Standard Curves and Surfaces",
CRC Press, 1993 ISBN 0-8493-0196-3

[4] Legendre, George L. and Grazini, Stefano "Pasta by Design",
Thames and Hudson, 2001

[5] Luke, Yudell "Algorithms for the Computation of Mathematical
Functions", Academic Press, 1977 ISBN 0-12-459940-6

[6] Zwillinger, Daniel "Handbook of Integration" Jones and Bartlett,
London, 1992
 


On Sun, Aug 21, 2016 at 10:16 AM, Dan Zwillinger <[hidden email]> wrote:

All

I began reading this topic's emails when they first appeared, and then fell behind.
Sorry for my late comments.

I admire your efforts.
They may be somewhat related to what I have done in the past.
My experience is as follows:

(1) CRC SMTF (Standard Mathematical Tables and Formula)

I put the ~700 integrals in CRC's SMTF into a format from which
(A) they could be typeset in LaTeX
(B) they could be converted into Mathematica (which either did a symbolic or numeric computation) - and this was done

I let Richard Fateman use them for his experiments.


(2) Elsevier's GR (Gradshteyn and Ryzhik's "Table of Integrals, Series, and Products")

I put the ~12,000 (if my memory is correct) integrals into a format from which
(A) they could be beautifully typeset in LaTeX
(B) they could be converted into Mathematica - and this was NOT done
Enclosed is a PDF file describing the work and the resulting format.


A different format was used for SMTF and GR.
While the SMTF work was not too arduous, the GR work was more than I had anticipated.
The input (the previous version of GR) had little syntactic structure and it took much effort to get it into shape.
I used (many different) regular expressions (in perl) to translate the bulk of the book, and then lots of hand tuning.

While I think you are well beyond my thinking on these topics, please let me know if I can help.
I am friends with Howard Cohl at NIST, who may be the current lead for DLMF ("Digital Library of Mathematical Functions" at dlmf.nist.gov).
Let me know if you need an introduction.


Dan Zwillinger
[hidden email]
<a href="tel:617-388-2382" value="+16173882382" target="_blank">617-388-2382
On 8/20/2016 11:30 PM, Tim Daly wrote:
The game is to define latex markup that is transparent to the syntax
but adds semantics for post processing.

Some simple tests show that this works. Suppose selatex.sty contains:

\newcommand{\INT}[1]{#1}
\newcommand{\VARIABLE}[1]{#1}
\newcommand{\POLY}[1]{#1}
\newcommand{\INTEG}[2]{\int{#1}}

This defines 4 new latex markups. The number in the square brackets
defines the number of expected arguments. The brace argument
delimites the characters that will occur during expansion with the #1
replaced by the first argument.

(As an aside, INT, VARIABLE, and POLY just happen to be valid
Axiom domain abbreviations, hence the name choice. This choice
of names gives grounding to the semantics.)

Notice that \INTEG takes two arguments but will display only one.
This allows the variable of integration to be passed in the semantics
without showing up in the output. This allows the semantics to carry
additional, non-display information needed by the CAS.

Some examples follow.

An integer 3 can be wrapped as \INT{3} but will still display as 3.

A variable x can be wrapped as \VARIABLE{x}, displayed as x.

$\POLY{\INT{3}\VARIABLE{x}}$ will display as 3*x

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x} will be the same result
as $\int{3x~dx}$, that is, an
  (integralsign) 3x dx
but notice that the variable of integration is in the semantic markup.

These trivial macros can be made less verbose and certainly
more clever but that's not the point being made here.

A 'weaver' program would see the integration expression as

$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x}$

with all of the semantic tags. The weaver's job is to rewrite this
expression into an inputform for the CAS. In Axiom that would be

integrate(3*x,x)

The semantics markup makes the display pretty and the semantic
parsing possible. Depending on the system, more or less parsing
markup can exist. Axiom, for example, would not need the \INT or
\VARIABLE to get a correct parse so the expression could be
$\INTEG{3x~dx}{x}$

This validates the fundamental idea.

The next step is to write a simple weaver program. The clever path
would be to embed a declarative form of the parser syntax (BNF?)
as comments in selatex.sty. That way the latex semantics and the
weaver syntax are kept in sync.

Weaver would read the BNF comments from selatex.sty and
the formula with semantic markup as input and parse the semantic
markup into inputforms. (Wish I thought of this homework problem
when I taught the compiler course :-) ).

Note that, depending on the BNF, weaver could be used to
generate output for Maxima's tree-based representation.

An alternative next step is to look at a CRC book, re-create the
syntactic latex and then create the selatex.sty entries necessary
to generate weaver input.

Infinitesimal progress, but progress non-the-less.

Tim


On Fri, Aug 19, 2016 at 12:45 AM, Tim Daly <[hidden email]> wrote:

One of the Axiom project goals is to develop a "Computer Algebra Test
Suite" (CATS). Albert Rich has done this with RUBI and integration. That
work is already partially in the test suite and work has been done on the
pattern matching. Large datasets (like Kamke) are always welcome. Some,
such as Schaums were hand-developed. This is tedious.

As Axiom develops more explanations and documentation it would be
useful to execute the formulas directly so there is a local incentive to be
clear about semantics.

In the long term the hope is that we can just grab formulas directly from
their sources (ala literate programming). Your work makes it plain that raw
latex does not carry sufficient semantics. There is a global question of
how to make this work. Unfortunately a general cross-platform solution
is difficult (cite Dewar/Davenport/et al. for OpenMath).

Since Axiom is literate and extracting formulas is trivial it seems that
literate markup is a natural goal. Since Axiom uses abstract algebra
as a scaffold the type tower already has a lot of axiomatic semantics.
The natural join of literate latex and abstract algebra is clearly
semantic markup, aka selatex.

===========
Consideration 10: semantic->inputform translation (weaver? :-) )

>x and not x   has no particular meaning,  but if x is explicitly true or false,
>Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
>defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
>semantics, which you presumably know about and can modify.

I am NOT defining another CAS. The goal is a "well-designed hack" using
universally understood latex, a latex package, and a translation program.

The selatex idea is only partially Axiom specific. \INT, for instance, seems
pretty generic. However, if the idea is to read formulas and disambiguate
a=b (boolean) vs a=b (equation) then the markup needs to be grounded
to have meaning. Axiom's domains (BOOLEAN) and (EQ) as the ground

\BOOLEAN(a=b)
\EQ(a=b)

are unambiguous relative to each other in Axiom. I don't know enough
about Maxima to understand how this might translate.

The extracted formulas with the decorated semantics still needs a
semantics->inputform (weaver) pre-processor which could be Maxima
specific. This would lead to debate about what "equality" means, of course.

Axiom has tried to create a first-order "rosetta stone" to translate between
systems (rosetta.pdf [1]) but it is too shallow to consider providing
cross-platform semantics.

=============
Consideration 11: \scope in selatex

>As far as recording stuff in DLMF -- there are presumably scope issues
>("in this chapter n,m are natural numbers....")  and maybe even a need
>to make value assignments. 
>I think you need to model these in SEALATEX too.

(See Consideration 6)

Clearly there are scoping issues. My current thinking is to create a
\scope markup that would manage the environment(s). This is not
a new issue (see "Lisp in Small Pieces" [0])

There seem to be three concerns.

First is the scope name, with something like 'global' as a keyword.
Second is the "closure chain" of other scopes.
Third is the symbol being scoped.

\scope{name}{chain}{symbol}

The weaver program would walk this chain to create the proper
file syntax for system input.

============
Consideration 12: System specific commands \axiom

Along with the formulas it is clear that some system specific
input may be required, such as loading files, clearing workspaces,
etc. Some of these may be done in the weaver program, such as
between formulas. Others may need to be added to the semantics
block. So a markup that provides verbatim quoting per system
might be defined, e.g.

\axiom{)clear all}  %clear the workspace

which would simply quote an input line.

==============

Note that so far all that is being suggested is transparent formula
markups which do not impact the presentation, some special tags
(\scope, \axiom,...) and a weaver program, along with the ability to
read the latex and extract named formulas (aka a literate program,
which Axiom already can do).

It ought to be possible (by design) to create a semantic version of
CRC that any system could import, assuming a "sufficiently clever
weaver".

On a more ambitious note, I am trying to find a way to keep the selatex
markup "hidden" in a pdf and use it as the clipboard paste when the
formula is selected. Anyone with a clue, please help.

===============

[0] Queinnec, Christopher, "Lisp in Small Pieces" ISBN 978-0521545662
(2003)

[1] Wester, Michael J. and Daly, TImothy "Rosetta"



On Thu, Aug 18, 2016 at 5:30 PM, Richard Fateman <[hidden email]> wrote:
thanks for all the references :)

I'm not sure if I'm going to repeat comments I  made already somewhere.
First,  has Dan Zwillinger weighed in?  I think that it would be useful
to see what he has done.

Next, there are ambiguities among CAS and even within a single CAS.

For example, in Macsyma/ Maxima  there is generally no semantics
associated with "=" or ">".   But in some contexts, there is some meaning.

x>2*y

is a tree expression.  It is not associated with x or with y.

assume(x>2*y)   does mean something ... it puts info in a database.
Somehow encoding the method to extract this information into SEALATEX
(SeLaTeX?) in a CAS-independent way -- that's quite a task.   In
particular, it would seem to require an understanding of what assume()
does in Maxima, and what is() does also.

x and not x   has no particular meaning,  but if x is explicitly true or false,
Maxima simplifies it to false.  If SEALATEX has a semantics -- are you
defining yet another CAS?  Or perhaps you should link it 100% to Axiom's
semantics, which you presumably know about and can modify.

As far as recording stuff in DLMF -- there are presumably scope issues
("in this chapter n,m are natural numbers....")  and maybe even a need
to make value assignments. 
I think you need to model these in SEALATEX too.

Just musing about where you are heading.

RJF




On 8/18/2016 11:45 AM, Tim Daly wrote:
Fateman [0] raised a set of issues with the OpenMath
approach. We are not trying to be cross-platform in this
effort. Axiom does provide an algebraic scaffold so it is
possible that the selatex markup might be useful elsewhere
but that is not a design criterion.

Fateman[1] also raises some difficult cross-platform issues
that are not part of this design.

Fateman[2] shows that parsing tex with only syntactic markup
succeeded on only 43% of 10740 inputs. It ought to be posible
to increase this percentage given proper semantic markup.
(Perhaps there should be a competition similar to the deep
learning groups? PhDs have been awarded on incremental
improvements of the percentage)

This is a design-by-crawl approach to the semantic markup
idea. The hope is to get something running this week that
'works' but giving due consideration to global and long-term
issues. A first glance at CRC/NIST raises more questions
than answers as is usual with any research.

It IS a design goal to support a Computer Algebra Test Suite
tedious to hand construct test suites. It will be even more
tedious to construct them "second-level" by doing semantic
markup and then trying to use them as input, but the hope is
that eventually the CRC/NIST/G&R, etc will eventually be
published with semantics so computational mathematics can
stop working from syntax.

===========
Consideration 4: I/O transparency

Assume for the moment that we take a latex file containing
only formulas. We would like to be able to read this file so
it has computational mathematics (CM) semantics.

It is clear that there needs to be semantic tags that carry the
information but these tags have to be carefully designed NOT
to change the syntactic display. They may, as noted before,
require multiple semantic versions for a single syntax.

It is also clear that we would like to be able to output formulas
with CM semantics where currently we only output syntax.

===========
Consideration 5: I/O isomorphism

An important property of selatex is an isomorphism with
input/output. Axiom allows output forms to be defined for a
variety of targets so this does not seem to be a problem. For
input, however, this means that the reader has to know how
to expand \INT{3} into the correct domain. This could be done
with a stand-alone pre-processor from selatex->inputform.

It should be possible to read-then-write an selatex formula,
or write-then-read an selatex formula with identical semantics.

That might not mean that the I/O is identical though due to
things like variable ordering, etc.

===========
Consideration 6: Latex semantic macros

Semantic markup would be greatly simplified if selatex provided
a mechanism similar to Axiom's ability to define types "on the fly"
using either assignment

   TYP:=FRAC(POLY(INT))

or macro form

   TYP ==> FRAC(POLY(INT))

Latex is capable of doing this and selatex should probably include
a set of pre-defined common markups, such as

  \FRINT ==> \FRAC\INT

===========
Consideration 7: selatex \begin{semantic} environment?

Currently Axiom provides a 'chunk' environment which surrounds
source code. The chunks are named so they can be extracted
individually or in groups

   \begin{chunk}{a name for the chunk}
      anything
   \end{chunk}

We could provide a similar environment for semantics such as

  \begin{semantics}{a name for the block}
  \end{semantics}

which would provide a way to encapsulate markup and also allow
a particular block to be extracted in literate programming style.

===========
Consideration 8: Latex-time processing

Axiom currently creates specific files using \write to create
intermediate files (e.g. for tables). This technique can be used
to enhance latex-time debugging (where did it fail?).

It can be used to create Axiom files which pre-construct domains
needed when the input file with semantic markup is read.

This would help a stand-alone selatex->inputform preprocessor.

===========
Consideration 9: Design sketches

It is all well-and-good to hand-wave at this idea but a large
amount of this machinery already exists.

It would seem useful to develop an incremental test suite that
starts with "primitive" domains (e.g. INT), creating selatex I/O.

Once these are in place we could work on "type tower" markup
such as \FRAC\INT or \POLY\COMPLEX\FLOAT.

Following that might be pre-existing latex functions like \int, \sum,
\cos, etc.

To validate these ideas Axiom will include an selatex.sty file and
some unit tests files on primitive domain markup. That should be
enough to start the bikeshed discussions.

Ideas? Considerations? Suggestions?

Tim

[0] Fateman, Richard J.
"A Critique of OpenMath and Thoughts on
Encoding Mathematics, January, 2001"
https://people.eecs.berkeley.edu/~fateman/papers/openmathcrit.pdf

[1] Fateman, Richard J.
"Verbs, Nouns, and Computer Algebra, or What's Grammar Got to
[2] Fateman, Richard J.













_______________________________________________
Axiom-developer mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/axiom-developer