I thought desktop publishing was a “solved problem” but it seems that it’s pretty far from it. When I worked for Sun Microsystems our tech writer would just “magically” compose the final document from our work. When I worked for other companies we mostly wrote internal documentation and used Word. However, with Codename One I had to pick all of that up and I learned how bad things are in the real world.
In this article I’d like to describe our toolchain, lessons learned and how we do our docs as a large heavily documented open source project. There are a lot of asciidoc tutorials and resources, this isn’t one of them. If you are interested in asciidoc O’Reilly did a great tutorial on that. I will give some tips in the end based on my experience with asciidoc but I’m mostly writing this to help other projects decide whether they should use asciidoc or stick to their existing documentation toolchain.
Initially I worked with Open Office and Word. They were both OK as editors for text and I was used to working with them but when it came time to do things like code highlighting or collaborative work they became a pain. I also discovered that you need to know what you are doing to produce a good looking document from either one of those. They make it very easy to write “badly” without styling and that makes it very hard to create a proper uniform document.
I liked the visual nature of editing but I didn’t really need it. My favorite feature was probably the much maligned grammar checker in Word… Yes, I know “it’s bad” but so is my grammar.
The end results looked awful and were really hard to maintain though. I also wanted more:
We also toyed with using Google Docs for a while but we had issues in scaling and getting the collaborative aspect going. When the community can edit the document they can break formatting and that made it really hard to follow up. Google Docs output looked even worse than Word so that wasn’t a good choice either.
We started using JBake for our site after trying multiple other static site generators. It worked really well and we loved the idea of static site generation for the common elements. One of the great features in JBake is the support for Asciidoc and after we reviewed all the other options it seemed like this would be the only “reasonable” way to have a guide that we can generate both as PDF and as decent looking HTML.
Besides all of that Asciidoc has a few noticeable advantages over word processors:
We thought about markdown as well but eventually went with asciidoc which seems to be more oriented towards desktop publishing whereas markdown is more oriented towards web output.
The cool thing about asciidoc is that it was practically built with coders in mind. Things like “callouts” that allow you to place numbers within the code and then elaborate on them after the code are trivial in asciidoc but painful with pretty much every other tool I’ve used.
Because we wanted everything to be broken down and manageable we placed every segment in a separate asciidoc file and hosted them in our github wiki. This allows pretty much anyone to just edit the files and also gives us great history for changes to the files.
We then need to generate JBake files for the website which should include the custom headers for that. I created a simple shell script that just copies the files from the wiki into our JBake web platform. It’s a bit long but generally it looks something like this (repeated for each file):
If you aren’t familiar with bash the whole part on the top is just the JBake header which is used when converting the files to static HTML.
I then pipe the output through sed which converts relative image URI’s to absolute image URI’s. The main logic here is that the manual directory in the website isn’t in the root and I want to store all the images in one place. Having an absolute URL allows me to move the manual to a different location easily. However, the wiki and PDF need relative URL’s as the location of the images varies.
I also set the asciidoc hint to use icon fonts instead of images when it’s showing notes and other such elements.
One of the big problems/mistakes I made when we started with asciidoc is the decision to use one big file. Part of the script concatenates all of the files together. A more “modern” approach is usage of an include directive from a master file but that caused some issues early on. The main challenge was keeping links in such a way that will work both for the JBake web version and for the PDF printed output. This approach works though, so while I’m not too thrilled with it we decided to stick with it for now.
We had to use special macros that our concatenating script processed in order to link differently in the PDF and regular website output:
Notice I used a java extension in the gist because if I use asciidoc gist would try to render the output instead of the content.
Our concatenation script is just a small simple Java app mostly because it’s easier for me to write code in Java than anything else. I also needed something more elaborate than just connecting the files together as I wanted there to be a Table Of Contents index in the HTML output. To do this I needed some logic which was pretty easy to implement in Java (thanks to the text file format):
Notice that this is relatively simple, we just find every top level and second level header based on the conventions then generate the TOC file. We also do all the basic stuff like set the date of the developer guide into place so everything is 100% automated (yes I know I can do that with a field in Word etc.).
The macro for the PDF is implemented by toggling the PDF only “magic comment” so code/description act correctly in the PDF mode.
Initially we used that a lot when doing links but lately we’ve been lazy as the special syntax is a bit painful.
When writing a web site you want to link as much as possible. Since we have extensive JavaDocs I thought it would make sense to hyperlink every mention of a class to its JavaDoc page. Besides the SEO benefit this could be useful to developers who can instantly find the class.
Doing this in the preprocessor isn’t an option. It would need a more elaborate parser and I wasn’t interested in going there. But I did a quick script that did that and hyperlinked everything then I fixed bad links manually. This seemed to work initially but had a problematic side effect…
When we generated the PDF every link generates a footnote which makes a lot of sense in theory. However, in a page about Button I might mention it 20 times which will trigger 20 footnotes with the same URL!
Unfortunately I couldn’t find a solution for that so I had to go over our links and try to reduce the amount. It’s just like
Another annoying thing is image behavior. You can easily float an image to the right in HTML output but not in PDF output. This is probably the most annoying formatting issue I came across. I could probably workaround it with some creative table structures in some cases but this seems like such a trivial thing…
One of the biggest problems with asciidoc is that it’s just all over the place. There are a few toolchains and some work while others produce “weird” output or fail without a real reason. We recently tried to generate an epub file directly from our manual asciidoc. It seems this translated the document internally to docbook that was malformed and then failed on validation.
I’m guessing this relates to out asciidoc code but it’s impossible to know why as the format isn’t validated.
I was only able to work with asciidoctor to HTML and the fo-pub toolchain. Everything else produced artifacts when leafing through the docs. Maybe there was a warning printed along the way but when going over hundreds of pages of output it’s hard to notice warnings. I’m not sure a “lint” like tool would work for something like asciidoc as the format is so loose.
This isn’t as much of a big deal, asciidoctor’s docbook output seems to work well and I was able to use that after the fact to generate things such as epub documents using tools such as pandoc. That’s pretty sweet as you can convert the output to things such as word relatively easily if you need to send it out and the output looks good.
Occasionally I had to hack various things in fo-pub to make the document look nicer. E.g. I wanted a good looking cover image for the book which I generated with Spark. So to get this image to “cover” the PDF I had to change asciidoctor-fopub/build/fopub/docbook/fo/division.xsl and add entries for the cover image:
I made a lot of similar edits to customize font size, margin etc.
For the print version I had to remove this code as amazon has its own cover and doesn’t accept images that “bleed” (bleeding is when an image intentionally goes out of the print space).
I usually work with NetBeans which has some initial asciidoc support but it’s not there yet.
So I use Atom for the docs. It’s surprisingly usable although it needs frequent restarts as it overtakes the CPU with large projects like this. One of the problems I’ve had with it is weird issues like this one. So if you have a Java based “try with resources” you need to add a semicolon to the end or the syntax highlighting block never ends and makes editing “weird”. It does have some pretty nice extensions like “write good” which do help my overly verbose writing style.
PDF is generally good for publishing but when we got to the kindle print stage we ran into issues with small images stretched by default. This meant the image DPI was too low for print and Amazon wouldn’t accept that. Unfortunately they didn’t always list all the problematic images so I had to upload a PDF wait for processing, open the preview, fix then rinse/repeat.
The solution was to add a “scaledwidth=30%” or something like that to images all over so they don’t upscale in the print version of the document.
Initially when we released the kindle reader version of the document I made the mistake of using Kindle Text book Publisher instead of generating an epub file. That means I can’t go back to publishing epub without publishing the book!
This is a shame as it means the book isn’t viewable on the standard e-ink devices from amazon and only on the kindle fire style devices. In retrospect I should have been more careful when uploading the first book.
One of the common things we tend to do as developers is work with A4 or letter sizes. This produces a book that’s a bit “large”. In retrospect I might have chosen a smaller form factor for the output and would probably do that for the next book. The current output is a bit big. I’m afraid this will increase the page size that’s already pretty big…
Right now the book clocks as 600+ pages but when I started it was closer to 1000 pages. It seems Amazon has a limit of ~890 or so pages. Since I had to shrink images anyway and reduced the font size a bit the number of pages dropped significantly.
I thought about color printing but that would have sent the book cost into the $50 or higher territory which I don’t think is fair for an open source book. Most of the images don’t really need color in this case. The copy I got doesn’t have color which is fine, but I think the text is a bit faded when compared to other books. I don’t think it’s a deal breaker and I’m not sure other people will notice it as I do.
I used the Amazon wizard for the cover generation which looks decent. I used the ready made cover image and mixed it with the generated cover. One caveat with the first book is that the back looks cartoonishly large. Since the book is of an A4 size the text in print just looked HUGE. I would recommend printing this on your local printer to get a sense of size before publishing.
One of the first things I noticed when I got the physical book back from Amazon was that it ended “abruptly”. I’m so used to books ending with an index and ours just doesn’t have it. We just didn’t include index markup for entries within the asciidoc code. The table of contents is really simple to do in asciidoc but an index requires some work and I still don’t understand why or if there is a better alternative to just littering the docs with index entries. It’s not a deal breaker for a book whose PDF version is available for free (people can just use search there instead of an index) but it’s not ideal.
Overall self publishing on Amazon is pretty trivial. The tools walk you through most of the steps you just need to make the right decisions early on as some things you can’t change after the fact (easily or at all):
I would use asciidoc for the next project. It has warts but it’s pretty much the document writing tool for coders. So even with the problems I’d go with it for the next book as well.
I think it’s a powerful tool that allows automation for coders and collaboration with familiar tools such as git and CI. I think our process can probably be refined a lot but for now it works.
If you want to look through our asciidoc code and the final results of everything I wrote here check out these links: