Improving Custom PyPI Theme Support In Yaydoc

Yaydoc has been supporting custom themes from nearly it’s inception. Themes, which it could not find locally, it would automatically try to install it via pip and set up appropriate metadata about the themes in the generated  It was one of the first major enhancement we provided as compared to when using bare sphinx to generate documentation. Since then, a large number of features have been added to ease the process of documentation generation but the core theming aspects have remained unchanged.

To use a theme, sphinx needs the exact name of the theme and the absolute path to it. To obtain these metadata, the existing implementation accessed the __file__ attribute of the imported package to get the absolute path to the file, a necessary element of all python packages. From there we searched for a file named theme.conf, and thus the directory containing that file was our required theme.

There were a few mistakes in our earlier implementation. For starters, we assumed that the distribution name of the theme in PyPI and the package name which should be imported would be same. This is generally true but is not necessary. One such theme from PyPI is Flask-Sphinx-Themes. While you need to install it using

pip install Flask-Sphinx-Themes

yet to import it in a module one needs to

import flask_sphinx_themes

This lead to build errors when specific themes like this was used. To solve this, we used the pkg_resources package. It allows us to get various metadata about a package in an abstract way without needing to specifically handle if the package is zipped or not.

    dist = pkg_resources.get_distribution('{{ html_theme }}')
    top_level = list(dist._get_metadata('top_level.txt'))[0]
    dist_path = os.path.join(dist.location, top_level)
except (pkg_resources.DistributionNotFound, IndexError):
    print("\nError with distribution {0}".format('{{ html_theme }}'))
    html_theme = 'fossasia_theme'
    html_theme_path = ['_themes']

The idea here is that instead of searching for, we read the name of the top_level directory using the first entry of the top_level.txt, a file created by setuptools when installing the package. We build the path by joining the location attribute of the Distribution object and the name of the top_level directory. The advantage with this approach is that we don’t need to import anything and thus no longer need to know the exact package name.

With this update, Support for custom themes has been greatly increased.


Continue ReadingImproving Custom PyPI Theme Support In Yaydoc

gh-pages Publishing in Yaydoc’s Web UI

A few weeks back we rolled out the web interface for yaydoc. Web UI will enable user to generate the documentation with one click and users can download the zipped format of generated documentation. In Yaydoc, we now added the additional feature of deploying the generated documentation to the GITHUB pages. In order to push the generated documentation, we have to get the access token of the user. So I used passport Github’s strategy to get the access token of the users.

passport.use(new githubStrategy({
  clientID: process.env.CLIENTID,
  clientSecret: process.env.CLIENTSECRET,
  callbackURL: process.env.CALLBACKURL
}, function (accessToken, refreshToken, profile, cb) {
  profile.token = accessToken;
  cb(null, profile)

passport.serializeUser(function(user, cb) {
  cb(null, user);

passport.deserializeUser(function(obj, cb) {
  cb(null, obj);

After setting the necessary environment variables, we have to pass the strategy to the express handler.

router.get("/github", function (req, res, next) {
  req.session.uniqueId = req.query.uniqueId; =
  req.session.gitURL = req.query.gitURL
}, passport.authenticate('github', {
  scope: [

For maintaining state, I’m keeping the necessary information in the session so, that in the callback URL we know which repository have to push.


router.get("/callback", passport.authenticate('github'), function (req, res, next) {
  req.session.username = req.user.username;
  req.session.token = req.user.token

router.get("/deploy", function (req, res, next) {
  res.render("deploy", {
    gitURL: req.session.gitURL,
    uniqueId: req.session.uniqueId,
    token: crypter.encrypt(req.session.token),
    username: req.session.username

Github will send the access token to our callback. After this I’m templating the necessary information to the jade deploy template where it’ll invoke the deploy function via sockets. Then we’ll stream all the bash output log to the website.

io.on('connection', function(socket){
  socket.on('execute', function (formData) {
    generator.executeScript(socket, formData);
  socket.on('deploy', function (data) {
    ghdeploy.deployPages(socket, data);

exports.deployPages = function (socket, data) {
  var donePercent = 0;
  var repoName = data.gitURL.split("/")[4].split(".")[0];
  var webUI = "true";
  var username = data.username
  var oauthToken = crypter.decrypt(data.encryptedToken)
  const args = [
    "-i", data.uniqueId,
    "-w", webUI,
    "-n", username,
    "-o", oauthToken,
    "-r", repoName
  var process = spawn("./", args);

  process.stdout.on('data', function (data) {
    socket.emit('deploy-logs', {donePercent: donePercent, data: data.toString()});
    donePercent += 18;

  process.stderr.on('data', function (data) {
    socket.emit('err-logs', data.toString());

  process.on('exit', function (code) {
    console.log('child process exited with code ' + code);
    if (code === 0) {
      socket.emit('deploy-success', {pagesURL: "https://" + data.username + "" + repoName});

Once documentation is pushed to gh-pages, the documentation URL will get appended to the web UI.


Continue Readinggh-pages Publishing in Yaydoc’s Web UI

Using Express to show previews in Yaydoc

In yaydoc WebUI, documentation is generated using sphnix and can be downloaded as a zip file. If the user wants to see a preview of the documentation they have to unzip the zipped file and have to check the generated website for each and every build. So, we decided to implement preview feature to show the preview of generated documentation so that user will have an idea of how the website would look. Since WebUI is made with Express, we implemented the preview feature using Express’s static function. Mostly static function is used for serving static assets but we used to serve our generated site because all the generated sites are static. All the generated documentation will have an unique id and all the unique ids are generated as per uuidv4 spec. The generated document will be saved and moved to the unique folder.

mv $BUILD_DIR/_build/html $ROOT_DIR/../${UNIQUEID}_preview && cd $_/../
var express = require("express")
var path = require("path")
var favicon = require("serve-favicon");
var logger = require("morgan");
var cookieParser = require("cookie-parser");
var bodyParser = require("body-parser");
var uuidV4 = require("uuid/v4");

var app = express();
app.use(bodyParser.urlencoded({ extended: false }));
app.use(cookieParser()); app.use(express.static(path.join(__dirname, "public")));
app.use("/preview", express.static(path.join(__dirname, "temp")))


The above snippet is the just a basic Express server. In which, we have a route /preview and with the express static handler. Pass the path of your generated website as  argument, Then your sites are served over /preview route.


Continue ReadingUsing Express to show previews in Yaydoc

Deploying documentations generated by Yaydoc to Heroku

There are many web applications available online that generates static websites. Among these projects are two unique projects developed here at FOSSASIA. These are the Open Event WebApp Generator and Yaydoc (an automatic documentation generation and deployment project.). Since Yaydoc already supports the deployment of the generated documentations to Github pages, it was just a matter of time that the deployment to Heroku is also supported.

Heroku is an excellent cloud-based platform used as a web application deployment service. Heroku provides most of its services at free of cost to the users and is excellent to host static websites provided that a little bit of tweaking is done.

For this implementation, we use the `Platform API` provided by Heroku. Stating it’s description mentioned in the documentation,

The platform API empowers developers to automate, extend and combine Heroku with other services. You can use the platform API to programmatically create apps, provision add-ons and perform other tasks that could previously only be accomplished with Heroku toolbelt or dashboard.

In order to deploy the static websites to Heroku, we need to first prepare a bundle of source code that has been compiled and is ready for execution on the Heroku runtime. This bundle is known as a Slug.

cd temp/$EMAIL/${UNIQUE_ID}_preview
mkdir -p app
cd app

curl | tar xzv > /dev/null

cp $BASE/web.js
rsync -av --progress ../ . --exclude app

cd ..
tar czfv slug.tgz ./app > /dev/null

We are using the files generated for preview to bundle them in a slug. Also, we download the NodeJS runtime files since we are deploying a static website to Heroku. Along with the static files, we require bundling a NodeJS server file (web.js) that will be used to reference the static files in the application.

After preparing the Slug, we publish the static web application to Heroku. For this, we start by creating a Heroku app using the command `heroku create <app-name>`. The app name is decided by the user when he or she fills the form in the Yaydoc Web App. Following that, we request Heroku to allocate a new slug for your app. After that, we upload the slug tar file to the platform.

# Create Heroku 
heroku create $APP_NAME

# Allocating new Slug
Arr=($(curl -u “:$API_KEY” -X \
-H ‘Content-Type:application/json’ \
-H ‘Accept: application/vnd.heroku+json;version=3’ \
-d ‘{“process_types”:{“web”:”node-v6.11.0-linux-x64/bin/node web.js”}}’ \
-n${APP_NAME}/slugs | \
python -c “import sys,json; obj=json.load(sys.stdin);
print(obj[‘blob’][‘url’] + ‘\n’ + obj[‘id’])”))

# Upload the slug tar file
curl -X PUT \
-H “Content-Type:”\
--data-binary @slug.tgz \

After uploading the slug to Heroku, we need to release the app. This is done using the following command.

curl -u “:$API_KEY” -X POST \
-H “Accept: application/vnd.heroku+json; version=3” \
-H “Content-Type: application/json” \
-d ‘{“slug”:”’${Arr[1]}’”}’ \

Releasing the application completes the process of deployment, making the documentation generated by Yaydoc up and running at the following URL: https://<app-name>


Continue ReadingDeploying documentations generated by Yaydoc to Heroku

Editing a file stored in the webserver from the Yaydoc Web App

As a developer, working on a web application, you may want your users to be able to edit a file stored in your webserver. There may be certain use cases in which this may be required. Consider, for instance, its use case in Yaydoc.

Yaydoc allows its users the feature of continuous deployment of their documentation by adding certain configurations in their .travis.yml file. It is possible for Yaydoc to achieve the editing of the Travis file from the Web App itself.

To enable the support of certain functionality in your web application, I have prepared a script using ExpressJS and Socket.IO which can perform the following action. At the client side, we define a retrieve-file event which emits a request to the server. At the server side, we handle the event by executing a retrieveContent(...) function which uses spawn method of child_process to execute a script that retrieves the content of a file.

// Client Side
$(function () {
 var socket = io();
 $(“#editor-button”).click(function () {

// Server Side
io.on(“connection”, function (socket) {
 socket.on(“retrieve-file”, function () {

var retrieveContent = function (socket) {
 var process = require(“child_process”).spawn(“cat”, [“.travis.yml”]);
 process.stdout.on(“data”, function (data) {
   socket.emit(“file-content”, data);

After the file content is retrieved from the server, we use a Javascript Editor like ACE to edit the content of the file. Making all the changes to the file, we emit a store-content event. At the server side, we handle the event by executing a storeContent(…) function which uses exec method of child_process to execute a bash script that stores the content to the same file.

$(function () {
 var socket = io();
 var editor = ace.edit(“editor”);
 socket.on(“file-content”, function (data) {
   editor.setValue(data, -1);
 $(“#save-modal”).click(function () {
   socket.emit(“store-content”, editor.getValue());

io.on(“connection”, function (socket) {
 socket.on(“store-content”, function (data) {
   storeContent(socket, data);

var storeContent = function (socket, data) {
 var script = ‘truncate -s 0 .travis.yml && echo “’ + data + ‘“ >> .travis.yml’;
 var process = require(“child_process”).exec(script);

 process.on(“exit”, function (code) {
   if (code === 0) {

After successful execution of the script, a successful event is sent to the client-side which can then be handled.

A minimal sample of this application can be found at:

Continue ReadingEditing a file stored in the webserver from the Yaydoc Web App

Setting up Yaydoc on Heroku

Yaydoc takes as its input the information about a user’s repository containing the documentation in Markup files and generates a static website from it. The website also includes search functionality within the documentation. It supports various built-in and custom Sphinx themes.

Since the Web User Interface is now prepared with some solid features, it was time to deploy. We chose Heroku for this because of the ease with which we can build and scale the application at free of cost.

Yaydoc consists of two components; A Web User Interface and the Generation and Deployment Scripts. The Web UI being developed with NodeJs and the scripts involving Python modules, require us to include the following buildpacks

  • heroku/nodejs
  • heroku/python

We need to set certain Environment Variables in Heroku for proper functioning of the Yaydoc. These include

  • CALLBACKURL – URL where Github must return to after successful authentication
  • CLIENTID – Unique Client-Id generated by Github OAuth Application
  • CLIENTSECRET – Unique Client-Secret generated by Github OAuth Application
  • ENCRYPTION_KEY – Required to encrypt Personal Access Token of the user
  • ON_HEROKU – True, since the application is deployed to Heroku
  • PYPANDOC_PANDOC – Location of Pandoc binaries
  • SECRET – A very secret token

Steps for Manual Deployment

  1. Install Heroku on your local machine.

    • If you have a linux based Operating Systems, type the following command in the terminal
wget -qO- | sh
heroku login
    • Enter your credentials and login.
  • Deploy Yaydoc to Heroku

    • Clone the original yaydoc repository or your own fork
git clone<username>/yaydoc.git
    • Move to the directory of the cloned repository
cd yaydoc/
    • Create a Heroku application using the following command
heroku create <your-app-name>
    • Add buildpacks to the application using the following commands
heroku buildpacks:set heroku/nodejs
heroku buildpacks:add --index 2 heroku/python
heroku buildpacks:add --index 3
    • Set Environment Variables using the following commands
heroku config:set CALLBACKURL=https://<your-app-name>
heroku config:set CLIENTID=<github-generated>
heroku config:set CLIENTSECRET=<github-generated>
heroku config:set ON_HEROKU=true
heroku config:set PYPANDOC_PANDOC=~/vendor/pandoc/bin/pandoc
heroku config:set SECRET=averysecrettoken
    • Now deploy your code
git push heroku master
    • Visit the app at the URL generated by its app name
heroku open
Continue ReadingSetting up Yaydoc on Heroku

Web User Interface for Yaydoc

Yaydoc consists of two components:

  1. A configuration for various Continuous Integration software including Travis CI among others.
  2. A Web User Interface

Since the initial stage of its development, the team has been focused on developing a `documentation generation` script and a `publish to Github Pages` script. These scripts have been developed and tested by using Travis CI.

We are now at that stage in the development of the project that we can generate the documentation of a project and keep it updated with every push in the Github repository that consists of changes in the documentation. A sample of this can be seen at which is a deployment of the documentation of Yaydoc using its own scripts.

After having enough confidence in the working of the script, we have now shifted our inclination towards developing a Web User Interface for the app. The WebUI is intended to perform various functionalities. These include, among others:

  • Generate the documentation and Download the static files in a compressed format.
  • Generate the documentation and make them available for a Preview
  • Generate the documentation and Deploy them to Heroku
  • Generate the documentation and Deploy them to web server using SFTP

NOTE:- The aforementioned functionalities are not exhaustive. Also, they are not certain to be developed if they are not fruitful for the users of Yaydoc. We do not intend to bloat the application with features and functionalities that may never be used.

Technology Stack

The first issue that comes with developing any Web Application is the selection of its technology stack. With a huge number of languages and their web application frameworks, it becomes very difficult to reach a conclusion. After a lot of discussions, NodeJS was selected.

The User Interface involves various technologies including

  1. NodeJS – A JavaScript runtime.
  2. ExpressJS – A minimal and flexible Node.js web application framework.
  3. Pug (ex – Jade) – A high-performance template engine implemented for NodeJS.
  4. Socket.IO – A JavaScript library for realtime web application that enables realtime, bi-directional communication between web clients and servers.

ExpressJS is set up using the express-generator as it prepares a proper minimal architecture which makes it easy to scale up the application. Since the HTML part of the application will be minimal, Pug was chosen as it has a very clean and easy to read syntax. The use of Socket.IO became necessary as the app has a bidirectional communication with the `GENERATE` script sending its log output to the front-end.

Components of the Web User Interface

The UI consists of a Form that asks the user to input

  1. Email address – To provide a unique identity for a user to isolate the documentation
  2. GITURL – URL of the repository which consists the docs to be generated
  3. Doc  Theme – A dropdown that consists of built in Sphinx themes.

Out of the various arguments used to generate documentation in Sphinx, following are assumed

  • AUTHOR – Name of the user/organization of the repository
  • PROJECTNAME – Name of the repository
  • DOCPATH – Documentations are assumed to be stored at “docs/”

Apart from the form, the UI also has a block that is used to display the logs while the bash script is running in the backend.

The components defined above are those that have been developed and are being tested rigorously. Since the app is constantly being developed with new features added almost daily, new components will be added to the User Interface.

Continue ReadingWeb User Interface for Yaydoc

Using a YAML file to read configuration options in Yaydoc

Yaydoc provides access to a lot of configurable variables which can be set as per requirements to configure various sections of the build process. You can see the entire list of variables in the project’s homepage. Till now the only way to do this was to set appropriate environment variables. Since a web user interface for yaydoc is in development, providing a clean UI was very important. This meant that we could not just create a bunch of input fields for all variables as that could be overwhelming for any new user. So we decided to ask only minimal information in the web form and read other variables if the user chooses to specify from a YAML file in the target repository.

To read a YAML file, we used PyYaml. It is a well established Python package to safely read info from a YAML file and convert it to a Python’s dictionary. Here is the code snippet for that.

def get_yaml_config():
        with open('.yaydoc.yml', 'r') as file:
            conf = yaml.safe_load(file)
    except FileNotFoundError:
        return {}
    return conf

The above code snippet returns a dictionary specifying all keys read from the YAML file. Since none of the options are required, we first create a dictionary with all defaults and recursively merges it with the yaml dict. The merging is done using the following code snippet:

for key, value in head.items():
    if isinstance(base, dict):
        if isinstance(value, dict):
            base[key] = update_dict(base.get(key, {}), value)
           base[key] = head[key]
        base = {key: head[key]}
return base

Now you can create a .yaydoc.yml file in the root of your repository and yaydoc would read options from there. Here is a sample yml file.

  author: FOSSASIA
  projectname: Yaydoc
  version: development

  doctheme: fossasia_theme
  docpath: docs/
  logo: images/logo.svg
  markdown_flavour: markdown_github


It should be noted that the layout of the file may change in the future as the project is in active development.


Continue ReadingUsing a YAML file to read configuration options in Yaydoc

How to write your own custom AST parser?

In Yaydoc, we are using pandoc to convert text from one format to another. Pandoc is one of the best text conversion tool which helps users to convert text between different markup formats. It is written in HASKELL. Many wrapper libraries are available for different programming languages which include python, nodejs, ruby. But in yaydoc, for a few particular scenarios we have to customize the conversion to meet our needs. So I started to build to a custom parser. The parser which I made will convert yml code block to yaml code block because sphinx need yaml code block for rendering. In order to parse, we have to split the text into tokens to our need. So initially we have to write a lexer to split the text into tokens. Here is the sample snippet for a basic lexer.

class Node:
    def __init__(self, text, token):
        self.text = text
        self.token = token
    def __str__(self):
        return self.text+' '+self.token
def lexer(text):
    def syntax_highliter_lexer(nodes, words):
        splitted_syntax_highligter = words.split('```')
        if splitted_syntax_highligter[0] is not '':
            nodes.append(Node(splitted_syntax_highligter[0], 'WORD'))
        splitted_syntax_highligter[0] = '```'
        words = ''.join([x for x in splitted_syntax_highligter])
        nodes.append(Node(words, 'SYNTAX HIGHLIGHTER'))
        return nodes
    syntax_re = re.compile('```')
    nodes = []
    pos = 0
    words = ''
    while pos < len(text):
        if text[pos] == ' ':
            if len(words) > 0:
                if is not None:
                    nodes = syntax_highliter_lexer(nodes, words)
                    nodes.append(Node(words, 'WORD'))
                words = ''
            nodes.append(Node(text[pos], 'SPACE'))
            pos = pos + 1
        elif text[pos] == '\n':
            if len(words) > 0:
                if is not None:
                    nodes = syntax_highliter_lexer(nodes, words)
                    nodes.append(Node(words, 'WORD'))
                words = ''
            nodes.append(Node(text[pos], 'NEWLINE'))
            pos = pos + 1
            words += text[pos]
            pos = pos + 1
    if len(words) > 0:
        if is not None:
            nodes = syntax_highliter_lexer(nodes, words)
            nodes.append(Node(words, 'WORD'))
    return nodes

After converting your text into tokens. We have to parse the tokens to match our need. In this case we need to build a simple parser

I chose the ABSTRACT SYNTAX TREE to build the parser. AST is a simple tree based on root node expression. The left node is evaluated first then the right node value. If there is one node after the root node just return the value. Sample snippet for AST parser

def parser(nodes, index):
    if nodes[index].token == 'NEWLINE':
        if index + 1 < len(nodes):
            return nodes[index].text + parser(nodes, index + 1)
            return nodes[index].text
    elif nodes[index].token == 'WORD':
        if index + 1 < len(nodes):
            return nodes[index].text + parser(nodes, index + 1)
            return nodes[index].text
    elif nodes[index].token == 'SYNTAX HIGHLIGHTER':
        if index + 1 < len(nodes):
            word = ''
            j = index + 1
            end_highligher = False
            end_pos = 0
            while j < len(nodes):
                if nodes[j].token == 'SYNTAX HIGHLIGHTER':
                    end_pos = j
                    end_highligher = True
                j = j + 1
            if end_highligher:
                for k in range(index, end_pos + 1):
                    word += nodes[k].text
                if index != 0:
                    if nodes[index - 1].token != 'NEWLINE':
                        word = '\n' + word
                if end_pos + 1 < len(nodes):
                    if nodes[end_pos + 1].token != 'NEWLINE':
                        word = word + '\n'
                    return word + parser(nodes, end_pos + 1)
                    return word
                return nodes[index].text + parser(nodes, index + 1)
            return nodes[index].text
    elif nodes[index].token == 'SPACE':
        if index + 1 < len(nodes):
            return nodes[index].text + parser(nodes, index + 1)
            return nodes[index].text

But we didn’t use the parser in Yaydoc because maintaining a custom parser is a huge hurdle. But it provided a good learning experience.


Continue ReadingHow to write your own custom AST parser?

How to add a custom filter to pypandoc

In Yaydoc, we met the problem of converting Markdown file into restructuredText because sphinx needs restructured text.  

Let us say we have a yml CodeBlock in Yaydoc’s, but sphinx  uses pygments for code highlighting which needs yaml instead of yml for proper documentation generation. Pandoc has an excellent feature which allows us to write our own custom logic to the AST parser.

INPUT --reader--> AST --filter--> AST --writer--> OUTPUT

Let me explain this in a few steps:

  1. Initially pandoc reads the file and then converts it into nodes.
  2. Then the nodes is sent to Pandoc AST for parsing the markdown to restructuredText.
  3. The parsed node will then go to the filter. The filter is converting the parsed node according to the logic implemented.
  4. Then the Pandoc AST performs further parsing and joins the Nodes into text and is written to the file.

One important point to remember is that, Pandoc reads the conversion from the filter output stream so don’t write print statement in the filter. If you write print statement pandoc cannot  parse the JSON. In order to do debugging you can use logging module from python standard module. Here is the sample Pypandoc filter:

#!/usr/bin/env python
from pandocfilters import CodeBlock, toJSONFilter

def yml_to_yaml(key, val, fmt, meta):
    if key == 'CodeBlock':
        [[indent, classes, keyvals], code] = val
        if len(classes) > 0:
            if classes[0] == u'yml':
                classes[0] = u'yaml'
        val = [[indent, classes, keyvals], code]
        return CodeBlock(val[0], val[1])

if __name__ == "__main__":

The above snippet checks whether the node is a CodeBlock or not. If it is a CodeBlock, it changes yml to yaml and prints it as a JSON in the output stream. It is then parsed by pandoc.

Finally, all you have to do is to add your filter to the Pypandoc’s filters argument.

output = pypandoc.convert_text(text, 'rst', format='md', filters=[os.path.join('filters', '')])


Continue ReadingHow to add a custom filter to pypandoc