Publishing Technical Documents with ePub

Prerelease Version

Adding Math to ePub

Adding Math to your content is a painful issue for many of you. There are several reasons for this:

  1. Math support has lagged across all ePub platforms. It wasn’t even a part of the ePub 2.x standard and has been one of the last items added to Readers supporting ePub 3.x.
  2. The official solution for adding math content, MathML, is not very human-readable and is not really the “go-to” solution for Mathematicians and Scientists when it comes to publishing their own work.
  3. The people supplying the content often have different areas of concern than you do which can lead to problems. For example, they may not have considered the need to meet accessibility requirements, the need to convert their source files to HTML at some point, or ways to add support for other features such as internal links to equations, etc.
  4. There is little overlap in the skills needed to create ePubs and the skills needed to comprehend the Math you are trying to add. This means ePub creators often have uncertainty about what should and should not be changed when adding Math to the finished product.

If we’re going to address these issues, and we are, then we’ve got alot of ground to cover in this chapter. Since the goal is to incorporate math into your finished ePub, we’ll cover MathML (in brief) and what the ePub spec requires when adding that content. From there, we will cover conversion to/from other Math file formats, how to correctly style your Math, alternative ways to present Math content, and other workflow ideas for handling this type of content.

MathML

As mentioned before, MathML is not really designed to be human-readable, and it’s really not designed to be human-editable. If you’re familiar with the spec, you can make changes and maybe even create equations from scratch. But most of the time, you’d be better off using other tools to build your content and then converting those results into MathML.

That being said, it’s still a good idea to give an overview of this format so that you can have an idea of how it is used in your documents.

Example MathML:

<m:math display="block" xmlns:m="http://www.w3.org/1998/Math/MathML" 
        alttext="x^2 + 4x = 5">
  <m:msup>
    <m:mi>x</m:mi>
    <m:mn>2</m:mn>
  </m:msup>
  <m:mo>+</m:mo>
  <m:mn>4</m:mn>
  <m:mi>x</m:mi>
  <m:mo>=</m:mo>
  <m:mn>5</m:mn>
</m:math>

Which results in this equation:

x2+4x=5

From a coding perspective, the XML is a tree data structure. Some elements, like <msup>, act as containers for other elements (so are “stem” nodes). Other elements, like <mi> or <mn> act as containers for the actual character data (so are “leaf” nodes). Different types of character data require a different leaf node wrapper, numbers use an <mn> element for example, and you should not have any character data loose in the XML. It all needs to be wrapped in some kind of leaf node.

You can use raw Unicode characters in your MathML at the leaf nodes, but higher value characters are often converted to numeric entities (e.g. &#x2205; or &#8705;) instead. See Appendix I or the MathML spec if you want more details on MathML elements.

Also, you will likely want to add wrapping <div> elements around <math> elements in your content so that you can use CSS to indent, center, and adjust size and space around your MathML.

Adding MathML to Your ePub

MathML in ePub differs from standard MathML in a few important ways:

Each MathML element will need to be in a separate namespace which is why it all of the elements in the examples have m: preceding them. This keeps the parser from thinking you’re using all of these non-HTML tags inside an HTML document and getting confused.

Also, you should add an alttext attribute (or use annotation-xml) to spell out the math equation being represented. The recommendation on this page suggests using MathSpeak as a description, but you can use other grammars if you wish (the example uses TeX). Just remember the alttext attribute field is limited to 255 characters.

Finally, you need to add mathml to the list of properties in the manifest <item> element of your OPF file (see here).

Note on <annotation-xml>

If you need a longer description in your math than 255 characters, you will need to make use of the semantics and annotation-xml elements. This will require some changes to the XML in your <math> element. This chapter in the MathML spec has a description of how this works, but those examples by themselves will not validate with ePubCheck without some changes. A better example is here:

Example MathML (with Semantics)

<m:math xmlns:m="http://www.w3.org/1998/Math/MathML">
  <m:semantics>
    <m:mrow>
      <m:msup>
        <m:mi>x</m:mi>
        <m:mn>2</m:mn>
      </m:msup>
      <m:mo>+</m:mo>
      <m:mn>4</m:mn>
      <m:mi>x</m:mi>
      <m:mo>=</m:mo>
      <m:mn>5</m:mn>
    </m:mrow>
    <m:annotation-xml encoding="application/xhtml+xml">
      <span>x^2 + 4x = 5</span>
    </m:annotation-xml>
  </m:semantics>
</m:math>

The resulting MathML will render exactly the same in most Readers as “plain” MathML, but the document will also include whatever description of the math you wish to add, and that description is not limited to 255 characters. Note that the semantics tag requires that all of your Presentation MathML be wrapped in a separate root tag (usually just mrow will work) in order to parse correctly.

Generating Math Content

Even a brief look at MathML should show you that building equations by hand is both time consuming and error-prone. You would need a good understanding of the MathML spec, a Unicode character chart, and the desire to waste time decrypting the inevitable validation errors. This should beg the question:

If humans didn’t build this by hand, where did the content come from? And how did they check their work?

The answer is they used other tools to create the content. Some of these tools include Equation Editors, TeX scripting tools, MathJax (or other browser based tools), Math conversion libraries, etc.

Equation Editors

These are apps that have some kind of WYSIWYG interface to help you lay out your equations. Once you have the math the way you want, you can export the result into MathML, TeX, an image format like PNG or JPG, or even an SVG.

There are a number of equation editors out there. I’ve even written one myself. The most established (and expensive) is probably MathType, but there are many others, including some good free ones. A quick search will likely turn up something at the price point (and quality) that matches your needs.

TeX Tools

TeX/LaTeX is a free and very well documented scripting language and publishing system, written by a computer genius, that is used by a number of Mathematicians and Scientists to generate their own papers. Here is a chapter from a Wikibook that can get you started with the Math parts of the language. The platform itself is at least as large as ePub and has a number of quirks. It is probably best to limit yourself just to the math section to start with, but it can be used later on to develop PDFs, even press-ready files, and has a number of other cool features that are beyond the scope of this book.

TeX itself is used by a number of equation editors and other software tools (like Apple’s iBooks Author). But more relevant to our purposes is the existence of tools that can convert TeX to MathML right from the command line. One of the best of these is BlahTeX. You will need to build the software from source, but that is relatively painless for just the plain blahtex tool. You can even create PNG versions of the math, if you have other required software, like a TeX distribution, installed.

MathJax

MathJax is a javascript based display engine for Mathematics that works in all browsers. It knows how to display MathML, TeX, and even ASCII-Math and does a very good job of finding and rendering math in your web documents. While you can, theoretically, embed this library and use it to display Math in some ePub Readers, it is much more useful as a way for you to preview any math in your HTML content to make sure it looks correct before you build your ePub.

Depending on your workflow, you may need to do some scripting to incorporate MathJax into your HTML file headers when previewing, and then use a different set of HTML headers for the actual ePub. But if you have to deal with a large amount of Math from disparate sources, this can reduce some of the stress, as you at least have an idea of what the math is supposed to look like before the ePub Reader gets ahold of it.

Other Conversion Libraries

Pandoc is a command line conversion library that can handle HTML, ePub, TeX/LaTeX, Markdown, and even word processing files from MS Word or Open Office. It has a bit of a learning curve but is well documented and suitable for adding to scripts without too much bother. I don’t believe handles MathML conversion as reliably as BlahTeX, but it can be used to help deal with source files that are in a troublesome file format.

Other tools that I’ve come across, but haven’t researched completely:

  • Lasem another command line tool for rendering MathML and SVG.
  • MathJaxRender which is one of a number of different projects that call the MathJax library from PhantomJS and use the result to build SVG and other graphics files.
  • MathML Cloud which is a much more sophisticated take on the previous approach. It uses MathJax to render the graphical results (PNG, and SVG), but it also has other tools which automatically generate a text description of the math using MathSpeak. It’s very new, but their website has a good interface and API. It’s also open source and available on GitHub.

There are also a number of LaTeX based tools that handle conversion. A quick search shows up projects like: LaTeX2HTML, mtex2MML, TtH, TeX4ht, plasTeX, and Hevea. Nearly all are open source projects that may morph, mutate, or disappear to be replaced by other solutions. But if you have LaTeX in your workflow, you can be sure there are tools out there to help with conversion.

This W3C page has links to a number of different XSLT tools that can be used to convert MathML to other formats. XSLT is an XML based system that lets you change XML files (including MathML) into another format, including a different type of XML, and is outside the scope of this book. There are restrictions on your input XML, it must be well-formed and parse correctly for example, but XSLT can be a useful tool depending on your workflow. Another W3C page has some additional information on this.

Styling Math in Your Documents

We’ve already covered most of the issues with adding Math content to your ePub, but you also need to style your document so that the Math actually looks like Math instead of blending in with the rest of your text. At least one book, Math Into Type, has been written on proofing and copyediting Math. The rules can be complex, and aren’t likely to be needed outside the context of Scientific or Mathematical Journal articles. But understanding some basics can help you catch obvious errors and also avoid conflicts with people who may understand the rules better than you.

Use of Italics

In Math, italics are used to indicate a letter is a variable, instead of being a word. Functions, on the other hand, are printed in roman to show that they are not variables. Lowercase Greek letters (which are often used as variable names) are generally italicized, but can be displayed as italic or roman depending upon the style. The upshot is, if you see a variable name in the text body, and it is not italicized, that is usually an mistake unless otherwise indicated. If you see an obvious function name that is in italics, that is likely also a mistake.

Spacing in Equations

You shouldn’t add spaces between variables in an equation, or between a number and a variable as this indicates multiplication. You should add thin spaces between operators (+, −, etc) or between other symbols that denote a verb (e.g., =). Math rendering engines, whether MathJax or browser based, should automatically handle this correctly. If you see a problem with the way it is being spaced, this likely indicates a problem with the MathML being input, and you need to determine what that problem is, rather than add extra markup to workaround it. If you see a problem with spacing and you’re dealing with normal text, you can either space it yourself, if it is something simple, or convert it to inline MathML, if not.

Display Equations

These are the large equations that are outside the main text bed. In most cases, they are also styled so they are slightly larger than the surrounding text body, centered, have some whitespace above and below them. They may also have equation numbers (which are referred to in the text). You are dependent on the Reader to render some of this correctly, but you can help it along somewhat by wrapping the <math> in a div element and adding some CSS styling. Spacing and centering are the most obvious attributes to style this way.

As far as equation numbering goes, the recommendation is to wrap your equation in an <mtable> element with the equation body in one <mtd> and the equation number in another. This is the kind of nonsense that we should have left behind ages ago, as we did for HTML. However the alternative, which is to add numbers using CSS counters or javascript, may not be much better as far as accessibility or being compatible across different Readers. The choice is kind of up to you as far as how you go about adding the numbers, but if they are referred to in the text, you should add them.

MathML does have support for aligning groups of equations at specific points in a column (e.g., aligning equations so the “=” sign lines up). However, this may not be implemented very well in most Readers. See this page for more details on how alignment is meant to work.

Alternatives to MathML

MathML should be your first choice for presenting Math in ePubs, but it isn’t supported in Version 2.x, it may not be supported correctly (or at all) in some Readers, and it may not be viable in your workflow for other reasons. If you want to put Math content in your ePub, and you can't use MathML, there are some options.

Option 1: Use an image

This works okay for display math, as you can take your complex equation and use an Equation editor or one of the other tools to convert it to an image. Then use CSS to center it and handle any spacing. You can even add numbering without too much bother. It doesn’t work very well for inline math, however.

There are some other downsides to this approach.

  1. Since its an image, it won’t respond to changes in text size and may look odd to the reader.
  2. You will need to add alt text describing each equation to meet accessibility requirements. You likely will need to do that for MathML, as well, so that’s not a huge deal.
  3. Adding equation images can increase your overall file size, especially if you have a large number of equations.
  4. Depending on your workflow, it may not be easy to automate the conversion process. If you have to do this on a regular basis, you may need a better solution.

Option 2: Use styled text

This works okay for inline equations, especially simple ones. Just take a preview of the end result (generated in MathJax, for example) and try and add spacing and styling to make your normal text look as close as possible to the preview. In the case of display equations, this may involve using large symbols and tables and is a little more work.

If you’re dealing with something so complex that you can’t display with just HTML, you can maybe use TeX or ASCII-Math type syntax and just hope your readers are sophisticated enough to read it. This is not an ideal solution in most cases, but can be used if you have small amounts of inline math and want to use images for the display math.

Option 3: Use iBooks

This is more of a tip for people who have to use ePub 2.x, for reasons, but still need to include some Math. If you know for certain your audience is only on iBooks, you can actually include MathML in your HTML and iBooks will just run with it.

  1. Use an HTML5 based XHTML document.
  2. Add the MathML namespace to your HTML root element. For example:
  3. <html xmlns:m="http://www.w3.org/1998/Math/MathML" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
  4. Add the namespaced MathML to your document. See Example 1 for details.

This will not validate correctly, but assuming you’ve handled everything else in the document correctly (so the only validation errors come from adding the MathML), iBooks will read and display it correctly.

If you want even better looking Math, you can import your resulting ePub into iBooks Author and tweak the way it looks from there. You may be forced to use the Apple ecosystem, but if target audience is, for example, students in your classroom with their own iPads, then it can be an acceptable workaround.

Handling Math in Your Workflow

Rather than try and suggest some one-size-fits-all solution, I'm just going to list out some considerations you may have and likely approaches to take. Hopefully this will help you decide what you need to do for your workflow.

1. Accessibility is a major requirement for my workflow.

The recommended solution is to use Version 3.x, include MathML with semantic tags if you have a large equation, and use MathSpeak or some other descriptive markup. If you have to support 2.x, you are better off converting to an image and then using a descriptive alt-text for each equation.

2. I need to convert Math to Images.

a. And I only have to do it occasionally.

You may be best served by installing an equation editor that can open MathML or TeX and having it output the result in an image format. There are a few free ones, and some other good ones that aren’t too expensive. MathML Cloud may be a viable solution as well, though it is still a very young project.

The main advantage to equation editors or a web app solution is that this will give you a way to create and change math files and make changes without having to install a large TeX distribution (which can weigh over 2GB in some cases). This is overkill if you just need to handle math on rare occasions.

b. I need to convert files often.

You will likely want to use something like BlahTeX or one of the other conversion solutions listed. For LaTeX based solutions, like BlahTeX, you will need to install a LaTeX distribution. Other solutions are more lightweight, but if you think you’ll be using TeX anyway, and there are some good tools with LaTeX distributions, then it’s not a big burden.

3. My source file is in images and I want to use MathML instead.

The first thing I would do is ask the content person if they have a way to give you their source in another format. It’s unlikely they generated those images out of thin air, so if it came from an equation editor, see if you can get it in MathML. If it came from TeX, ask for the TeX source, etc.

If that’s not feasible, then you will likely need some kind of equation editor or use TeX to generate the Math by hand. Make sure the content provider understands this issue and try to arrange it, if possible, so that they don’t supply Math as images in the future.

Of course, you could always just use those images in the ePub, but that shouldn’t be your first choice unless they specifically request the files to be that way.

4. My source file is in TeX or even LaTeX. How do I handle this?

The good news is that Math using TeX can be converted to MathML without much trouble. And LaTeX documents actually have more structure than a Word file so you’re not too bad off there. Use some of the tools mentioned (like Pandoc, or one of the TeX alternatives), and try and convert the result into HTML + MathML. You want to try and do as work little by hand as possible. If you can figure out how automate some of this while you’re doing the work, it’ll save you even more time in the future.

5. My source file is in Word or a PDF or something else strange.

Some of the tools we’ve mentioned already can be useful here can be helpful. Much of the export depends on how easy it is to create an HTML version of the file and export the math. It can be relatively painless or a complete nightmare.

Previous

Next

Table of Contents