Editing a file stored in the webserver from the Yaydoc Web App

As a developer, working on a web application, you may want your users to be able to edit a file stored in your webserver. There may be certain use cases in which this may be required. Consider, for instance, its use case in Yaydoc.

Yaydoc allows its users the feature of continuous deployment of their documentation by adding certain configurations in their .travis.yml file. It is possible for Yaydoc to achieve the editing of the Travis file from the Web App itself.

To enable the support of certain functionality in your web application, I have prepared a script using ExpressJS and Socket.IO which can perform the following action. At the client side, we define a retrieve-file event which emits a request to the server. At the server side, we handle the event by executing a retrieveContent(...) function which uses spawn method of child_process to execute a script that retrieves the content of a file.

// Client Side
$(function () {
 var socket = io();
 $(“#editor-button”).click(function () {

// Server Side
io.on(“connection”, function (socket) {
 socket.on(“retrieve-file”, function () {

var retrieveContent = function (socket) {
 var process = require(“child_process”).spawn(“cat”, [“.travis.yml”]);
 process.stdout.on(“data”, function (data) {
   socket.emit(“file-content”, data);

After the file content is retrieved from the server, we use a Javascript Editor like ACE to edit the content of the file. Making all the changes to the file, we emit a store-content event. At the server side, we handle the event by executing a storeContent(…) function which uses exec method of child_process to execute a bash script that stores the content to the same file.

$(function () {
 var socket = io();
 var editor = ace.edit(“editor”);
 socket.on(“file-content”, function (data) {
   editor.setValue(data, -1);
 $(“#save-modal”).click(function () {
   socket.emit(“store-content”, editor.getValue());

io.on(“connection”, function (socket) {
 socket.on(“store-content”, function (data) {
   storeContent(socket, data);

var storeContent = function (socket, data) {
 var script = ‘truncate -s 0 .travis.yml && echo “’ + data + ‘“ >> .travis.yml’;
 var process = require(“child_process”).exec(script);

 process.on(“exit”, function (code) {
   if (code === 0) {

After successful execution of the script, a successful event is sent to the client-side which can then be handled.

A minimal sample of this application can be found at: https://github.com/imujjwal96/socket-editor

Continue ReadingEditing a file stored in the webserver from the Yaydoc Web App

Setting up Yaydoc on Heroku

Yaydoc takes as its input the information about a user’s repository containing the documentation in Markup files and generates a static website from it. The website also includes search functionality within the documentation. It supports various built-in and custom Sphinx themes.

Since the Web User Interface is now prepared with some solid features, it was time to deploy. We chose Heroku for this because of the ease with which we can build and scale the application at free of cost.

Yaydoc consists of two components; A Web User Interface and the Generation and Deployment Scripts. The Web UI being developed with NodeJs and the scripts involving Python modules, require us to include the following buildpacks

  • heroku/nodejs
  • heroku/python
  • https://github.com/imujjwal96/heroku-buildpack-pandoc.git

We need to set certain Environment Variables in Heroku for proper functioning of the Yaydoc. These include

  • CALLBACKURL – URL where Github must return to after successful authentication
  • CLIENTID – Unique Client-Id generated by Github OAuth Application
  • CLIENTSECRET – Unique Client-Secret generated by Github OAuth Application
  • ENCRYPTION_KEY – Required to encrypt Personal Access Token of the user
  • ON_HEROKU – True, since the application is deployed to Heroku
  • PYPANDOC_PANDOC – Location of Pandoc binaries
  • SECRET – A very secret token

Steps for Manual Deployment

  1. Install Heroku on your local machine.

    • If you have a linux based Operating Systems, type the following command in the terminal
wget -qO- https://cli-assets.heroku.com/install-ubuntu.sh | sh
heroku login
    • Enter your credentials and login.
  • Deploy Yaydoc to Heroku

    • Clone the original yaydoc repository or your own fork
git clone https://github.com/<username>/yaydoc.git
    • Move to the directory of the cloned repository
cd yaydoc/
    • Create a Heroku application using the following command
heroku create <your-app-name>
    • Add buildpacks to the application using the following commands
heroku buildpacks:set heroku/nodejs
heroku buildpacks:add --index 2 heroku/python
heroku buildpacks:add --index 3 https://github.com/imujjwal96/heroku-buildpack-pandoc.git
    • Set Environment Variables using the following commands
heroku config:set CALLBACKURL=https://<your-app-name>.herokuapp.com/callback
heroku config:set CLIENTID=<github-generated>
heroku config:set CLIENTSECRET=<github-generated>
heroku config:set ON_HEROKU=true
heroku config:set PYPANDOC_PANDOC=~/vendor/pandoc/bin/pandoc
heroku config:set SECRET=averysecrettoken
    • Now deploy your code
git push heroku master
    • Visit the app at the URL generated by its app name
heroku open
Continue ReadingSetting up Yaydoc on Heroku

Web User Interface for Yaydoc

Yaydoc consists of two components:

  1. A configuration for various Continuous Integration software including Travis CI among others.
  2. A Web User Interface

Since the initial stage of its development, the team has been focused on developing a `documentation generation` script and a `publish to Github Pages` script. These scripts have been developed and tested by using Travis CI.

We are now at that stage in the development of the project that we can generate the documentation of a project and keep it updated with every push in the Github repository that consists of changes in the documentation. A sample of this can be seen at https://yaydoc.fossasia.org which is a deployment of the documentation of Yaydoc using its own scripts.

After having enough confidence in the working of the script, we have now shifted our inclination towards developing a Web User Interface for the app. The WebUI is intended to perform various functionalities. These include, among others:

  • Generate the documentation and Download the static files in a compressed format.
  • Generate the documentation and make them available for a Preview
  • Generate the documentation and Deploy them to Heroku
  • Generate the documentation and Deploy them to web server using SFTP

NOTE:- The aforementioned functionalities are not exhaustive. Also, they are not certain to be developed if they are not fruitful for the users of Yaydoc. We do not intend to bloat the application with features and functionalities that may never be used.

Technology Stack

The first issue that comes with developing any Web Application is the selection of its technology stack. With a huge number of languages and their web application frameworks, it becomes very difficult to reach a conclusion. After a lot of discussions, NodeJS was selected.

The User Interface involves various technologies including

  1. NodeJS – A JavaScript runtime.
  2. ExpressJS – A minimal and flexible Node.js web application framework.
  3. Pug (ex – Jade) – A high-performance template engine implemented for NodeJS.
  4. Socket.IO – A JavaScript library for realtime web application that enables realtime, bi-directional communication between web clients and servers.

ExpressJS is set up using the express-generator as it prepares a proper minimal architecture which makes it easy to scale up the application. Since the HTML part of the application will be minimal, Pug was chosen as it has a very clean and easy to read syntax. The use of Socket.IO became necessary as the app has a bidirectional communication with the `GENERATE` script sending its log output to the front-end.

Components of the Web User Interface

The UI consists of a Form that asks the user to input

  1. Email address – To provide a unique identity for a user to isolate the documentation
  2. GITURL – URL of the repository which consists the docs to be generated
  3. Doc  Theme – A dropdown that consists of built in Sphinx themes.

Out of the various arguments used to generate documentation in Sphinx, following are assumed

  • AUTHOR – Name of the user/organization of the repository
  • PROJECTNAME – Name of the repository
  • DOCPATH – Documentations are assumed to be stored at “docs/”

Apart from the form, the UI also has a block that is used to display the logs while the bash script is running in the backend.

The components defined above are those that have been developed and are being tested rigorously. Since the app is constantly being developed with new features added almost daily, new components will be added to the User Interface.

Continue ReadingWeb User Interface for Yaydoc

Using a YAML file to read configuration options in Yaydoc

Yaydoc provides access to a lot of configurable variables which can be set as per requirements to configure various sections of the build process. You can see the entire list of variables in the project’s homepage. Till now the only way to do this was to set appropriate environment variables. Since a web user interface for yaydoc is in development, providing a clean UI was very important. This meant that we could not just create a bunch of input fields for all variables as that could be overwhelming for any new user. So we decided to ask only minimal information in the web form and read other variables if the user chooses to specify from a YAML file in the target repository.

To read a YAML file, we used PyYaml. It is a well established Python package to safely read info from a YAML file and convert it to a Python’s dictionary. Here is the code snippet for that.

def get_yaml_config():
        with open('.yaydoc.yml', 'r') as file:
            conf = yaml.safe_load(file)
    except FileNotFoundError:
        return {}
    return conf

The above code snippet returns a dictionary specifying all keys read from the YAML file. Since none of the options are required, we first create a dictionary with all defaults and recursively merges it with the yaml dict. The merging is done using the following code snippet:

for key, value in head.items():
    if isinstance(base, dict):
        if isinstance(value, dict):
            base[key] = update_dict(base.get(key, {}), value)
           base[key] = head[key]
        base = {key: head[key]}
return base

Now you can create a .yaydoc.yml file in the root of your repository and yaydoc would read options from there. Here is a sample yml file.

  author: FOSSASIA
  projectname: Yaydoc
  version: development

  doctheme: fossasia_theme
  docpath: docs/
  logo: images/logo.svg
  markdown_flavour: markdown_github

    docurl: yaydoc.fossasia.org

It should be noted that the layout of the file may change in the future as the project is in active development.


Continue ReadingUsing a YAML file to read configuration options in Yaydoc

How to write your own custom AST parser?

In Yaydoc, we are using pandoc to convert text from one format to another. Pandoc is one of the best text conversion tool which helps users to convert text between different markup formats. It is written in HASKELL. Many wrapper libraries are available for different programming languages which include python, nodejs, ruby. But in yaydoc, for a few particular scenarios we have to customize the conversion to meet our needs. So I started to build to a custom parser. The parser which I made will convert yml code block to yaml code block because sphinx need yaml code block for rendering. In order to parse, we have to split the text into tokens to our need. So initially we have to write a lexer to split the text into tokens. Here is the sample snippet for a basic lexer.

class Node:
    def __init__(self, text, token):
        self.text = text
        self.token = token
    def __str__(self):
        return self.text+' '+self.token
def lexer(text):
    def syntax_highliter_lexer(nodes, words):
        splitted_syntax_highligter = words.split('```')
        if splitted_syntax_highligter[0] is not '':
            nodes.append(Node(splitted_syntax_highligter[0], 'WORD'))
        splitted_syntax_highligter[0] = '```'
        words = ''.join([x for x in splitted_syntax_highligter])
        nodes.append(Node(words, 'SYNTAX HIGHLIGHTER'))
        return nodes
    syntax_re = re.compile('```')
    nodes = []
    pos = 0
    words = ''
    while pos < len(text):
        if text[pos] == ' ':
            if len(words) > 0:
                if syntax_re.search(words) is not None:
                    nodes = syntax_highliter_lexer(nodes, words)
                    nodes.append(Node(words, 'WORD'))
                words = ''
            nodes.append(Node(text[pos], 'SPACE'))
            pos = pos + 1
        elif text[pos] == '\n':
            if len(words) > 0:
                if syntax_re.search(words) is not None:
                    nodes = syntax_highliter_lexer(nodes, words)
                    nodes.append(Node(words, 'WORD'))
                words = ''
            nodes.append(Node(text[pos], 'NEWLINE'))
            pos = pos + 1
            words += text[pos]
            pos = pos + 1
    if len(words) > 0:
        if syntax_re.search(words) is not None:
            nodes = syntax_highliter_lexer(nodes, words)
            nodes.append(Node(words, 'WORD'))
    return nodes

After converting your text into tokens. We have to parse the tokens to match our need. In this case we need to build a simple parser

I chose the ABSTRACT SYNTAX TREE to build the parser. AST is a simple tree based on root node expression. The left node is evaluated first then the right node value. If there is one node after the root node just return the value. Sample snippet for AST parser

def parser(nodes, index):
    if nodes[index].token == 'NEWLINE':
        if index + 1 < len(nodes):
            return nodes[index].text + parser(nodes, index + 1)
            return nodes[index].text
    elif nodes[index].token == 'WORD':
        if index + 1 < len(nodes):
            return nodes[index].text + parser(nodes, index + 1)
            return nodes[index].text
    elif nodes[index].token == 'SYNTAX HIGHLIGHTER':
        if index + 1 < len(nodes):
            word = ''
            j = index + 1
            end_highligher = False
            end_pos = 0
            while j < len(nodes):
                if nodes[j].token == 'SYNTAX HIGHLIGHTER':
                    end_pos = j
                    end_highligher = True
                j = j + 1
            if end_highligher:
                for k in range(index, end_pos + 1):
                    word += nodes[k].text
                if index != 0:
                    if nodes[index - 1].token != 'NEWLINE':
                        word = '\n' + word
                if end_pos + 1 < len(nodes):
                    if nodes[end_pos + 1].token != 'NEWLINE':
                        word = word + '\n'
                    return word + parser(nodes, end_pos + 1)
                    return word
                return nodes[index].text + parser(nodes, index + 1)
            return nodes[index].text
    elif nodes[index].token == 'SPACE':
        if index + 1 < len(nodes):
            return nodes[index].text + parser(nodes, index + 1)
            return nodes[index].text

But we didn’t use the parser in Yaydoc because maintaining a custom parser is a huge hurdle. But it provided a good learning experience.


Continue ReadingHow to write your own custom AST parser?

How to add a custom filter to pypandoc

In Yaydoc, we met the problem of converting Markdown file into restructuredText because sphinx needs restructured text.  

Let us say we have a yml CodeBlock in Yaydoc’s README.md, but sphinx  uses pygments for code highlighting which needs yaml instead of yml for proper documentation generation. Pandoc has an excellent feature which allows us to write our own custom logic to the AST parser.

INPUT --reader--> AST --filter--> AST --writer--> OUTPUT

Let me explain this in a few steps:

  1. Initially pandoc reads the file and then converts it into nodes.
  2. Then the nodes is sent to Pandoc AST for parsing the markdown to restructuredText.
  3. The parsed node will then go to the filter. The filter is converting the parsed node according to the logic implemented.
  4. Then the Pandoc AST performs further parsing and joins the Nodes into text and is written to the file.

One important point to remember is that, Pandoc reads the conversion from the filter output stream so don’t write print statement in the filter. If you write print statement pandoc cannot  parse the JSON. In order to do debugging you can use logging module from python standard module. Here is the sample Pypandoc filter:

#!/usr/bin/env python
from pandocfilters import CodeBlock, toJSONFilter

def yml_to_yaml(key, val, fmt, meta):
    if key == 'CodeBlock':
        [[indent, classes, keyvals], code] = val
        if len(classes) > 0:
            if classes[0] == u'yml':
                classes[0] = u'yaml'
        val = [[indent, classes, keyvals], code]
        return CodeBlock(val[0], val[1])

if __name__ == "__main__":

The above snippet checks whether the node is a CodeBlock or not. If it is a CodeBlock, it changes yml to yaml and prints it as a JSON in the output stream. It is then parsed by pandoc.

Finally, all you have to do is to add your filter to the Pypandoc’s filters argument.

output = pypandoc.convert_text(text, 'rst', format='md', filters=[os.path.join('filters', 'yml_filter.py')])


Continue ReadingHow to add a custom filter to pypandoc

Pipelining Bash Script’s output to Webapp using Socket.io

Yaydoc, our automatic documentation generator, among other components, consists of a Web User Interface. This UI has a form that takes as its input certain information about a user’s project and generates documentations using this information in the backend with the help of a Bash Script. The caveat of executing such a Bash Script is that a user will have to wait for the processing to complete in order to get any output on the WebApp. This creates some problem as the user may not know if the process is executing properly. Furthermore, servers that are used to deploy such web applications have a limited time span within which it must send a response to a received GET or POST request. Since executing scripts may take some time, the process may lead to a Request Timeout.

We faced a similar problem with Yaydoc while deploying it to Heroku. Since Heroku has a timeout at 30 seconds, executing the Documentation Generation script lead to a Request Timeout as it takes more than 30 seconds for the execution. After doing a bit of research, we were introduced with Socket.io. Socket.IO is one of the most powerful Javascript frameworks which enables real-time bidirectional event-based communication.

At the client side, we define an “execute” event which emits the form data when the “Generate Docs” button is clicked. At the server side, we handle the event by executing a generator.executeScript(...) function with the socket and formData as its arguments.

 * Client-side Event Handling

$(function () {
 var socket = io();
 $(“#btnGenerate”).click(function () {
   var formData = getData();
   socket.emit(“execute”, formData);

 * Server-side Event Handling
io.on(“connection”, function (socket) {
 socket.on(“execute”, function (formData) {
   generator.executeScript(socket, formData);


Bash scripts are executed in NodeJS by creating child processes using the `child_process` module. This module provides four different methods for executing external applications. They are:

  1. execFile
  2. exec
  3. spawn
  4. fork

Out of these, the exec() and execFile() methods returns buffered data when the script executes successfully. We cannot use them as a solution because we need to continuously receive certain response from the server after execution of a limited number of commands in the script. Thus, we opt for spawn() which returns a stream based object every time the script produces some data. The spawn method is called in the executeScript method.

exports.executeScript = function (socket, formData) {
 var process = spawn(“./generate.sh”, args);
 process.stdout.on(“data”, function (data) {
   socket.emit(“logs”, {data: data});

The emitted logs are then received at the client-side for display in the web application.

 * Client-side Event Handling
$(function () {
 socket.on(“logs”, function (data) {

A minimal sample of this application can be found at: https://github.com/imujjwal96/socket-bashing

Continue ReadingPipelining Bash Script’s output to Webapp using Socket.io

Automatically Generating index for documentation in Yaydoc

Yaydoc which uses Sphinx Documentation Generator internally needs a document named index.rst describing the overall layout of the documentation to generate a proper table of contents. Without an index.rst present, the build fails. With this week’s update that constraint has been relaxed. Now if yaydoc detects that index.rst has not been supplied, it automatically generates a minimal index for basic use. Although it is still recommended to provide your own index, you won’t be punished for its absence. The following sections show how this was implemented and also shows this feature in action.


For generating a minimal index.rst, we perform the following steps:

  • If the repository has a README.rst or a README.md, we include it in the index
  • Several toctrees are generated as per how the documents in the repository are arranged.

The following code snippet returns a valid rst block which includes the document dirpath/filename

def get_include(dirpath, filename):
    ext = os.path.splitext(filename)[1]
    if ext == '.md':
        directive = 'mdinclude'
        directive = 'include'
    template = '.. {directive}:: {document}'
    path = os.path.relpath(os.path.join(dirpath, filename))
    document = path.replace(os.path.sep, '/')
    return template.format(directive=directive, document=document)

The following code snippet returns a valid rst block which creates a toctree of dirpath.

def get_toctree(dirpath, filenames):
    toctree = ['.. toctree::', '   :maxdepth: 1']
    caption_template = '   :caption: {caption}'
    content_template = '   {document}'

    caption = os.path.basename(dirpath).replace('_', ' ').title()
    if caption == os.curdir:
        caption = 'Contents'
    # Inserting a blank line

    valid = False
    for filename in filenames:
        path, ext = os.path.splitext(os.path.join(dirpath, filename))
        if ext not in ('.md', '.rst'):
        document = path.replace(os.path.sep, '/')
        document = document.lstrip('./').rstrip('/')
        valid = True

    if valid:
        return '\n'.join(toctree)
        return ''

The following code snippet walks the documentation directory and returns a valid content to be written to index.rst.

def get_index(root):
    index = []
    # Include README from root
    root_files = next(os.walk(root))[2]
    if 'README.rst' in root_files:
        index.append(get_include(root, 'README.rst'))
    elif 'README.md' in root_files:
        index.append(get_include(root, 'README.md'))
    # Add toctrees as per the directory structure
    for (dirpath, dirnames, filenames) in os.walk(os.curdir):
    if filenames:
        toctree = get_toctree(dirpath, filenames)
        if toctree:
    return '\n\n'.join(index) + '\n'


Let’s assume that a sample project has the following directory tree for documentation.

|   +---_installation_guide/
|   |   +--- setup_heroku.md
|   |   +--- setup_docker.md
|   +---_tutorial/
|   |   +--- basic.md
|   |   +--- advanced.md

The following index.rst would be generated from the above tree

.. mdinclude:: ../README.md

.. toctree::
   :caption: Installation Guide
   :maxdepth: 1


.. toctree::
   :caption: Tutorial


As you can see, this index.rst would be enough for most use cases. This update decreases the entry barrier for yaydoc. More features are on the way.


Continue ReadingAutomatically Generating index for documentation in Yaydoc

Using Root Directory as the Documentation Directory with Yaydoc

In our test builds for Yaydoc, we found that If we set the root as the documentation directory, the build would fail with a very long build log. In the build process, we create some temporary directories such as a virtual environment and the build directory in the root. After some inspection of the build logs, we found out that when the root is itself used as the documentation directory, we were accidently recursively copying the build directory into itself which led to build failure. Together with this, since the virtual environment directory was also being accidently copied to the build directory, we were actually building the documentation of the entire Python standard library on each build.

Once the problem and It’s cause was known, the course of action to be taken was clear. We needed to ensure that any temporary directories which we create as part of the build process was not being copied to the build directory. The following changes were made to achieve that.

  • The virtual environment directory was now being created in the HOME directory instead of the root.
  • Any other temporary directories which except the main build directory was now deleted before copying.
  • To prevent the recursive copying, we used the –exclude parameter of rsync.
rsync --exclude=BUILD_DIR DOCS_DIR/ BUILD_DIR/

After this patch, root can also be used as the documentation directory with Yaydoc. To do so, just set the environment variable DOCPATH as “.”

Continue ReadingUsing Root Directory as the Documentation Directory with Yaydoc

Deploy Static Web Pages In Six Keystrokes

I added two fairly young projects – Query Server and YayDoc to the projects list on http://labs.fossasia.org/. I pulled the code from GitHub, made the changes and it worked fine. Now to get it reviewed from a co-developer, I needed to host my changes somewhere on the web.

The fossasia-labs repository runs on gh-pages by GitHub. Hence, one way of hosting my changes was to use gh-pages on my fork but I tried this tool instead to deploy my site in six keystrokes.

This is what it took to deploy the static webpage right from my command line. Let’s dive into how this tool is as easy as it gets.

What is surge?
surge is a web-publishing tool aimed at front-end developers to help them get their static web pages up and running easily. It can be used to deploy HTML, CSS and JS with the ease of a single command.

How to use surge?
surge is quite an easy tool to use.  It has been developed as a npm package. Now for folks who don’t know what npm is – npm is the JavaScript package manager (Curious?).

To have surge running, you need to have Node.js installed. Run these in the terminal:

sudo apt-get update 
sudo apt-get install nodejs
sudo apt-get install npm 

Now you have Nodejs as well as npm installed. Let’s move on to the main course – installing surge.

npm install --global surge

You have installed surge!
(You may need to preface this command with sudo.)

So let’s go to the directory where we have our files to deploy. Here I have the labs.fossasia.org repository which we’ll try to deploy.

To clone this repo, run this command:

git clone git@github.com:fossasia/labs.fossasia.org.git

After cding into the directory named labs.fossasia.org type


and hit enter.

You’ll be prompted to sign up with your email. Choose a password. After that you’ll see something similar to this.  

Properties of the directory – path and size are listed here. Also, as you can see in the picture, a domain is listed. This is a randomly generated domain by surge. You can stick with it too, or just delete it and type whatever domain you like. surge will deploy your directory to that domain, provided that it is available.

In this example, I thought to escape elfin-education and go with my-labs.surge.sh .

Press enter after typing in the desired domain name and you’ll see surge uploading files to the domain. After it successfully deploys, you’ll get a message :

That’s it. Finally it’s time to check my-labs.surge.sh .

Saving your Domain with CNAME

Next up we take a look at making surge remember the domain.

You’ll be prompted for a domain name, every time you run surge inside the same directory (this is the default behavior). This can be avoided by simply adding a CNAME file to your directory root. Let’s say that you want to stick with ‘my-labs.surge.sh’ in the above example. You can add it to the CNAME file by running this in the terminal.

  echo my-labs.surge.sh > CNAME  

surge also offers adding your own custom domain for deployments. To know about this and read further about surge, visit surge.sh .

Additional Resources

Continue ReadingDeploy Static Web Pages In Six Keystrokes