I’m rewriting my book in HTML instead of Markdown. Here’s why.
When you think of a book you probably think of prose. A bunch of paragraphs with section headers and chapter names, and perhaps a few illustrations. In short, you are thinking of a paper book. When I first wrote the HTML Canvas Deep Dive I was thinking along those lines as well, but I also wanted interactivity. What’s the point of having an educational book on the web if we can’t push the envelope a bit.
The original book used a hacky JS library I wrote called Jangle, patterned after some research Bret Victor did called Tangle. This was then mixed in with a bunch of Markdown files that were compiled into a series of HTML files using some rather dodgy scripts. And now those scripts don't work anymore and I can't remember how I wrote them. Clearly this solution won’t do any longer. I need tools that are robust and reliable, and hopefully something that others are already using so I don’t have to do all of the heavy lifting myself.
Let’s consider what we will need.
- a semantic format for the prose. Typically this is Markdown or a related format like Asciidoc.
- A way to extend the prose with custom features my book needs, like interactive examples, zoomable images, and iframe popups.
- a continuous integration system to constantly build the book, ensuring quality and that we don’t get bit-rot.
- a standard place to put translations so it’s easy to switch and we don’t have to reinvent the wheel for each new language supported.
- a usable typographic style built with modern CSS features like Grid and Flexbox
You’ll notice one thing not on this list is print output. Originally I did support printing to PDFs but almost no one used them. And why would they: PDFs don’t support the interactive features that make this book unique. I’m trying to teach dynamic programming of canvas; I need a dynamic medium.
It also doesn’t help that implementations of CSS printing features are still woefully behind the interactive ones.
Markdown vs HTML
So let’s start with the desired input format. The logical solution is Markdown. It looks like plain text, it’s easy for anyone to write, and there are oodles of tools to process it. However I have decided not to use Markdown. I actually spent a lengthy plane ride recently trying to use it but gave up in frustration. Markdown is great when you are writing prose with only minor formatting, but the minute you need to start extending it the benefits of Markdown evaporate.
The point of Markdown is that the writer doesn’t have to know any syntax, or at least not very much. Over time Markdown has grown, however. Now you have to remember the pipe syntax for tables. And triple backquotes for inline code snippets. And don’t forget to include the language of the snippet on the first line with the backquotes. Oh, and make sure which particular dialect of Markdown you are working with. Is it Github’s or CommonMark? All of these rules are starting to sound like syntax.
Of course there is nothing wrong with syntax per se, but adding new syntax to Markdown turns out to be really difficult. Markdown itself is underspecified and relies on a state machine that switches between inline and block formatting. I tried modifying a parser to implement a simple extension that would let me mark certain images as needing to be pop outs. Every time I tried to pick a symbol or keyword it would conflict with another feature. Markdown is great for what it is, but it is definitely not extensible.
For a few minutes I started looking at other formats like DocBook when I realized I’d had a brain fart. I already have an extensible text format with well supported tools. HTML! In fact, why do we even need something like Markdown to begin with?
Why is this:
Some <b>bold</b> text
so different than this
Some *bold* text
HTML can do everything that Markdown can and much more, so why shouldn’t I write with it?
Actually, there are some good reasons not to use HTML as a writing format. HTML can let you over specify the formatting and over think what you are writing. It’s all too easy to worry about paragraph vs section vs div when I really should be thinking about what sentences will make sense to my readers. And then we have to make sure head material and meta data is right. And check the links. And of course who wants to remember to close all of those tags. HTML isn’t all roses and sunshine.
However, I the more I thought about it, I realized there is a difference between proper HTML meant for viewing on the web (with all of it’s links checked and metatags optimized), and a soft non-wellformed semantic HTML used for drafts. Most of the problems of writing HTML come from writing final output markup. If instead I use a subset of HTML, and count on the well designed parsers to handle anything I can throw at it, then semantic HTML will serve me just fine.
So here’s the plan.
Write semantic HTML without all of the head matter. I can even leave out the actual
html element. The parser will figure it out. Then add extensions with custom elements, data attributes, and all of the other tricks we already know how to do with HTML. Writing an interactive book is within reach again, provided I have the right tools to work with it.
Next time I’ll talk about the new HTML parsing toolchain I found called UnifiedJS.
Posted June 28th, 2019