Extending Markdown Support in Yaydoc

Yaydoc, our automatic documentation generator, builds static websites from a set of markup documents in markdown or reStructuredText format. Yaydoc uses the sphinx documentation generator internally hence reStructuredText support comes out of the box with it. To support markdown we use multiple techniques depending on the context. Most of the markdown support is provided by recommonmark, a docutils bridge for sphinx which basically converts markdown documents into proper docutil’s abstract syntax tree which is then converted to HTML by sphinx. While It works pretty well for most of the use cases, It does fall short in some instances. They are discussed in the following paragraphs.

The first problem was inclusion of other markdown files in the starting page. This was due to the fact that markdown does not supports any include mechanism. And if we used the reStructuredText include directive, the included text was parsed as reStructuredText. This problem was solved earlier using pandoc – an excellent tool to convert between various markup formats. What we did was that we created another directive mdinclude which converts the markdown to reStructuredText before inclusion. Although this was solved a while ago, The reason I’m discussing this here is that this was the inspiration behind the solution to our recent problem.

The problem we encountered was that recommonmark follows the Commonmark spec which is an ongoing effort towards standardization of markdown which has been somewhat lacking till now. The process is currently going on so the recommonmark library doesn’t yet support the concept of extensions to support various features of different markdown flavours not in the core commonmark spec. We could have settled for only supporting the markdown features in the core spec but tables not being present in the core spec was problematic. We had to support tables as it is widely used in most of the docs present in github repositories as GFM(Github Flavoured Markdown) renders ascii tables nicely.

The solution was to use a combination of recommonmark and pandoc. recommonmark provides a eval_rst code block which can be used to embed non-section reStructuredText within markdown. I created a new MarkdownParser class which inherited the CommonMarkParser class from recommonmark. Within it, using regular expressions, I convert any text within `<!– markdown+ –>` and `<!– endmarkdown+ –>`  into reStructuredText and enclose it within eval_rst code block. The result was that tables when enclosed within those trigger html comments would be converted to reST tables and then enclosed within eval_rst block which resulted in recommonmark renderering them properly. Below is a snippet which shows how this was implemented.

import re
from recommonmark.parser import CommonMarkParser
from md2rst import md2rst

MARKDOWN_PLUS_REGEX = re.compile('<!--\s+markdown\+\s+-->(.*?)<!--\s+endmarkdown\+\s+-->', re.DOTALL)
EVAL_RST_TEMPLATE = "```eval_rst\n{content}\n```"

def preprocess_markdown(inputstring):
    def callback(match_object):
        text = match_object.group(1)
        return EVAL_RST_TEMPLATE.format(content=md2rst(text))

    return re.sub(MARKDOWN_PLUS_REGEX, callback, inputstring)

class MarkdownParser(CommonMarkParser):
    def parse(self, inputstring, document):
        content = preprocess_markdown(inputstring)
        CommonMarkParser.parse(self, content, document)