James Carlson

@jxxcarlson

Towards LaTeX in the Browser

The way we used to do it.

TeX (/ˈtɛx/ tekh as in Greek)

MiniLatex LiveDemo App II

Recent news

Disappointed in the appearance and esthetics of the proofs for for the second volume of his Art of Computer Programming, the famed computer scientist Donald Knuth set out to build the right tool for setting mathematical text, a job he felt would take about six months. This was 1977. By 1978, he had working program, baptized TeX, and in 1989 it was released in final form. It has been the foundation of mathematical typesetting ever since. Thus, to produce the formula

one writes the incantation

$$
\int_{-\infty}^\infty e^{-x^2/2} dx = \sqrt{2\pi}
$$

Knuth’s program plays the role of master magician, turning the incantation into the beautiful formula displayed above. It may seem retrograde, or maybe just retro, to write such an arcane sequence of symbols to obtain the desired artistry. Surely in the age of the mouse and the touch screen, there is something better! Perhaps, but those who do mathematics day in and day out do it this way, not out of ignorance or stubbornness, but because it is the BETTER WAY: faster, once you know what you are doing, guaranteed to give you a beautiful result without messing around, and able to handle the weirdest mathematics that you may discover or invent.

MathJax

TeX, LaTeX, and Postscript taken together solve the problem of composing mathematical text for print. Add PDF to the mix and the problem of distributing whole mathematical texts electronically is also solved. But what about the worldwide web? Can one edit and display LaTeX in the browser?

MathJax (see mathjax.org) solves half of the browser problem — the display of mathematical text per se, that which is fenced off by dollar signs or double dollar signs, as in the example above. The results are beautiful, a tribute to the skill and craft of the MathJax team. MathJax is how the blogs you find on the internet deal with mathematical text.

Nonetheless, there is a gap, because LaTeX does more than display formulas— there are automatically numbered sections and subsections, equations which are cross-referenced by a convenient system of labels, special “environments” for theorems, itemized lists, verbatim text, etc., etc. So the question is: what can be done for the part of LaTeX outside the dollar signs?

MiniLaTeX

To reproduce all of LaTeX in the browser is too much to ask, but what is not too much is to define a reasonable subset which can be processed with the usual tools or with some new browser-based tool. To do this is the aim of the MiniLaTeX project. While still in an experimental stage, the project shows promise — look, for example, at these documents and drafts: MiniLaTeX, Infinity, Quantum Field Theory Notes, and Elm by Example. They were written using the editing tools at www.knode.io using MiniLatex, and so provide a primitive proof-of-concept.

You can experiment with MiniLaTeX using the MiniLaTeX Demo App. To write documents using MiniLaTeX, try www.knode.io. As this project is very much in a fluid stage of development, I welcome (and need) comments, criticism, and bug reports (jxxcarlson at gmail).

Nuts and bolts

The MiniLaTeX project employs a divide-and-conquer strategy, using MathJax to render all of the text fenced off by dollar signs. To handle the part of LaTeX outside of the fences, one uses a parser — a specialized program — that reads the LaTeX source text and converts it to an intermediate form called an Abstract Syntax Tree (AST). The AST “knows” the grammar of LaTeX and encodes this knowledge in such a way that it easy for a second program, the renderer, to convert it into HTML, which is the language in which web pages are written. To put it in a few words, it is all a matter of translating from one language to another.

Getting Technical

Let’s now get technical, or at least a little bit technical. For the real deal, see the MiniLaTeX Technical Description and Progress Report.

The first step in setting up a language-translation toolchain is to define the language to be translated. For this, it is a good idea to start out with a programming language that is expressive enough to make writing a parser easy (not to mention possible). The language I chose is a recent one, Elm, created and developed by Evan Czaplicki, who conceived it as part of his 2012 senior thesis at Harvard University. Elm is a purely functional language for building front-end web apps. It compiles to Javascript and has a number of remarkable properties: (1) there are no runtime exceptions, (2) the compiler gives the best error messages on the planet, transforming it from an annoying nag to an incredibly friendly and helpful assistant, (3) refactoring, even of the most drastic kind, is easy; as a result, so is maintaining code over the long run. Oh — and one more thing. It is fast. The deciding factor for this project, however, is the fact that there are excellent parser-writing tools, akin to the parser combinators in Haskell’s parsec.

The MiniLaTeX language definition as it stands now is quite small, just nine lines of code:

type LatexExpression
= LXString String
| Comment String
| Item Int LatexExpression
| InlineMath String
| DisplayMath String
| Macro String (List LatexExpression)
| Environment String LatexExpression
| LatexList (List LatexExpression)

The parser is also quite small — at present 304 lines. (See the GitHub repository ). Of course, there is more, because once a passage of MiniLaTeX source text has been parsed into an AST, it has to be rendered into HTML. The renderer is larger (548 lines). However, it is much easier, indeed almost routine to write. For more details, please see the MiniLaTeX Technical Description and Progress Report.

Peeking under the Hood

For those who would like to peek under the hood just a little more, consider this fanciful piece of LaTeX text:

This is MiniLaTeX:
\begin{theorem}
This is a test: $\alpha^2 = 7$ \foo{1}
\begin{a}
la di dah
\end{a}
\end{theorem}

The experimental parser, version 1.0, transforms it into the following AST:

Ok (LatexList (
LXString "This is MiniLaTeX:",
[Environment "theorem" (
LatexList ([
LXString "This is a test:",
InlineMath "\alpha^2 = 7",
Macro "foo" ["1"],
Environment "a" (
LatexList ([LXString "la di dah"])
)]))]))

And here is how the AST is rendered:

This is MiniLaTeX:
<div class="environment">
<strong>Theorem</strong>
<div class="italic">
This is a test: $\alpha^2 = 7$ \foo{1}
<div class="environment">
<strong>A</strong>
<div class="italic">
la di dah
</div>
</div>
</div>
</div>

Comments on Rendering

Your might ask, “What’s the deal with \foo{1}?” It was passed on as-is. Well, that turns out to be a good strategy. If MiniLaTeX understands a TeX macro, it acts accordingly. For example, in the case \emph{Wow!}, the text “Wow!” is italicized. But if MiniLaTeX doesn’t know about the macro, it just passes it on. If the macro appears inside something like an equation environment, then MathJax may know how to handle it, so passing it on as-is the right thing to do. If neither MiniLaTeX nor MathJax knows how to handle it, the author can see that something is wrong, and he or she can take corrective action.

You might also ask: “What’s the deal the \begin{a}?” There is no environment named “a” in LaTeX. True, of course. The idea here is to have a default way of handling environments that MiniLaTeX does not know about, so that they will “just work.” When running a MiniLaTeX document through a conventional tool such as pdflatex, the user can customize these environments, or use a macro package the defines them. For example, a trial lawyer may wish to use an “objection” environment, or a psychic may wish to have “prediction” environment. Whatever. In the meantime, both the lawyer and the psychic can use these environments in MiniLaTeX. Thus, even though the prediction environment is not wired into MiniLaTeX, the text

\begin{prediction} MiniLaTeX will be a big success. \end{prediction}
\begin{prediction} You will win the lottery. \end{prediction}

renders as

Prediction 1 MiniLaTeX will be a big success.
Prediction 2 You will win the lottery.

Yes!

Notes

  1. We are exploring ways for users to define non-default environment behaviors in the browser. The same goes for macros used outside the dollar and double-dollar fences.
  2. Authors using www.knode.io can use whatever macro definitions they need for the MathJax part of MiniLaTeX documents. One way is to put their definitions inside double dollar signs in the text of the document. Another way is to set up a document of macro definitions that is included in other MiniLatex documents.
  3. MiniLatex Grammar — a draft, but gives an idea of what is going on. The grammar is LL(*) because it has to look arbitrarily far ahead to recognize a begin — end block. Since the largest unit parsed is a logical paragraph, one should think of it as LL(N), where N is the number of characters in the paragraph to be parsed. A logical paragraph is either an ordinary paragraph or an outer begin — end block.
  4. knode.io seems to work much better in Firefox than in Chrome — at least for me.

Credits

I wish to acknowledge the generous help that I have received throughout the work on this project from the community at elmlang.slack.com, with special thanks to Ilias van Peer.

Jim Carlson

Topics of interest

More Related Stories