Handling Errors While Parsing the yaml File in Yaydoc

Yaydoc, our automatic documentation generator uses a yaml file to read a user’s configuration. The internal configuration parser basically converts the yaml file to a python dictionary. Then, it serializes the values of that dictionary using a custom serialization format. From there it associates those values with environment variables which are then passed to bash scripts for various tasks such as deployment, generation, etc.. Some of those environment variables are again passed to another python layer which interacts with sphinx where they are deserialized before use. This whole system works pretty well for our use cases.

Now let’s assume a user adds a yaml file where they have a malformed section in the file. For example, to specify a theme, one needs to add the following to the yaml file.

build:
  theme:
    name: sphinx_fossasia_theme

But our user has the following in their yaml file.

build:
  theme: sphinx_fossasia_theme

Now this will raise an error as we expect a dictionary as a value for the key ‘theme’ but we got a string. Now how do we handle such cases without ignoring the entire file as that would be too much of a penalty for such a small mistake? One approach would have been to wrap each call to connect with a bunch of try-catch but that would render the code unreadable as the initial motivation for implementing the connect method was to abstract the internal implementation so that other contributors who may not be well versed with python can also easily add config options without needing to learn a bunch of python constructs.

So, what we did was that, while merging the dictionary containing default options and the dictionary containing the user preferences, we check whether the default has the same data type as that of the incoming value. If they are, It’s deemed safe to merge. There are certain relaxations though, like if the current type is a list, then the incoming value can be of any time as that can always be converted to a list of a single element. This is required to support the following syntax.

key:
  - value
key: value

The above two blocks are equivalent due to the above-mentioned approach although the type is different.

Now, after this pre-validation step is over we can ensure that the if the assumed type for a key is let’s say a dictionary, then it would be a dictionary. Hence no type errors would be raised like trying to access a dict method for another object, say a string which happened with the earlier implementation. After this, an extra parameter was added to the connect method to which we can now pass a validation function which if returns false, those values would be ignored. Usage of this feature has been implemented to a small level where we validate the links to subprojects and if they look like a valid github repo only then will they be included. Note that their existence is not checked. Only a regex based validation is performed.

It was also important to notify the user about these events when we detect that a specific section is invalid and provide informative and helpful error messages without failing the build. Hence proper error messages were also added which were informative so that the user knows exactly which section is to blame. This is similar to compilers where the error message is crucial to debug a certain piece of code.

Resources

Implementing an Interface for Reading Configuration from a YAML File for Yaydoc

Yaydoc reads configuration specified in a YAML file to set various options during the build process. This allows users to customize various properties of the build process. The current implementation for this was very basic. Basically it uses a pyYAML, a yaml parser for python to read the file and convert it to a python dictionary. From the dictionary we extracted values for various properties and converting them to strings using various heuristics such as converting True to ”true”, False to ”false”, a list to comma separated string and None to an empty string. Finally, we exported variables with those values.

Recently the entire code for this was rewritten using object-oriented paradigm. The motivation for this came from the fact that the implementation lacked certain features and also required some refactoring for long term readability. In the following paragraph, I have discussed the new implementation.

Firstly a Configuration class was created which basically wraps around a dictionary and provide certain utility methods. The primary difference is that the Configuration class allows dotted key access. This means that you can use the following syntax to access nested keys.

theme = conf[‘build.theme.name’]

The class provides another method connect which is used to connect environment variables with configuration values. This method also takes a dotted key but provides an extension on top of that to handle the case when a certain option can take multiple values. For example,

option: my_option

Or,

option:
  - my_option1
  - my_option2

To indicate that a certain config is of this type, you can specify a “@” character at the end of the key. Anything after the “@” character is assumed to be an attribute of each element within the list. Let’s see an example of this whole process.

build:
  subproject:
    - url: <url1>
  source: “doc”
    - url: <url2>

Now to extract all urls from the above file, we’d need to do the following

config.connect(‘SUBPROJECT_URLS’, ‘[email protected]’)

To extract sources, we’ll also use the default parameter as the source option is optional.

config.connect(‘SUBPROJECT_SOURCES’, [email protected]’, default=’docs’)

Finally, The Configuration object also provides a getenv method which reads all connection and serializes values to string according to the previously described heuristics. It then returns a dictionary of all environment variables which must be set.

Resources

Specifying Configurations for Yaydoc with .yaydoc.yml

Like many continuous integration services, Yaydoc also reads configurations from a YAML file. To get started with Yaydoc the first step is to create a file named .yaydoc.yml and specify the required options. Recently the yaydoc team finalized the layout of the file. You can still expect some changes as the projects continues to grow on but they shall not be major ones. Let me give you an outline of the entire layout before describing each in detail. Overall the file is divided into four sections.

  • metadata
  • build
  • publish
  • extras

For the first three sections, their intention is pretty clear from their names. The last section extras is meant to hold settings related to integration to external services.

Following is a description of config options under the metadata section and it’s example usage.

Key Description Default
author

The author of the repository. It is used to construct the copyright text.

user/organization
projectname

The name of the project. This would be displayed on the generated documentation.

Name of the repository
version

The version of the project. This would be displayed alongside the project name

Current UTC date
debug

If true, the logs generated would be a little more verbose. Can be one of true|false.

true
inline_math

Whether inline math should be enabled. This only affects markdown documents.

false
autoindex

This section contains various settings to control the behavior of the auto generated index. Use this to customize the starting page while having the benefit of not having to specify a manual index.

metadata:
  author: FOSSASIA
  projectname: Yaydoc
  version: development
  debug: true
  inline_math: false

Following is a description of config options under the build section and it’s example usage.

Key Description Default
theme

The theme which should be used to build the documentation. The attribute name can be one of the built-in themes or any custom sphinx theme from PyPI. Note that for PyPI themes, you need to specify the distribution name and not the package name. It also provides an attribute options to control specific theme options

sphinx_fossasia_theme
source

This is the path which would be scanned for markdown and reST files. Also any static content such as images referenced from embedded html in markdown should be placed under a _static directory inside source. Note that the README would always be included in the starting page irrespective of source from the auto-generated index

docs/
logo

The path to an image to be used as logo for the project. The path specified should be relative to the source directory.

markdown_flavour

The markdown flavour which should be used to parse the markdown documents. Possible values for this are markdown, markdown_strict, markdown_phpextra, markdown_github, markdown_mmd and commonmark. Note that currently this option is only used for parsing any included documents using the mdinclude directive and it’s unlikely to change soon.

markdown_github
mock

Any python modules or packages which should be mocked. This only makes sense if the project is in python and uses autodoc has C dependencies.

autoapi

If enabled, Yaydoc will crawl your repository and try to extract API documentation. It Provides attributes for specifying the language and source path. Currently supported languages are java and python.

subproject

This section can be used to include other repositories when building the docs for the current repositories. The source attribute should be set accordingly.

github_button

This section can be used to include various Github buttons such as fork, watch, star, etc.

hidden
github_ribbon

This section can be used to include a Github ribbon linking to your github repo.

hidden
build:
  theme:
    name: sphinx_fossasia_theme
  source: docs
  logo: images/logo.svg
  markdown_flavour: markdown_github
  mock:
    - numpy
    - scipy
  autoapi:
    - language: python
      source: modules
    - language: java
  subproject:
    - url: <URL of Subproject 1>
      source: doc
    - url: <URL of subproject 2>

Following is a description of config options under the publish section and it’s example usage.

Key Description Default
ghpages

It provides a attribute url whose value is written in a CNAME file while publishing to github pages.

heroku

It provides an app_name attribute which is used as the name of the heroku app. Your docs would be deployed at <app_name>.herokuapp.com

publish:
  ghpages:
    url: yaydoc.fossasia.org
  heroku:
    app_name: yaydoc 

Following is a description of config options under the extras section and it’s example usage.

Key Description Default
swagger

This can be used to include swagger API documentation in the build. The attribute url should point to a valid swagger json file. It also accepts an additional parameter ui which for now only supports swagger.

javadoc

It takes an attribute path and can include javadocs from the repository.

extras:
  swagger:
    url: http://api.susi.ai/docs/swagger.json
    ui: swagger
  javadoc:
    path: 'src/' 

Resources

Using a YAML file to read configuration options in Yaydoc

Yaydoc provides access to a lot of configurable variables which can be set as per requirements to configure various sections of the build process. You can see the entire list of variables in the project’s homepage. Till now the only way to do this was to set appropriate environment variables. Since a web user interface for yaydoc is in development, providing a clean UI was very important. This meant that we could not just create a bunch of input fields for all variables as that could be overwhelming for any new user. So we decided to ask only minimal information in the web form and read other variables if the user chooses to specify from a YAML file in the target repository.

To read a YAML file, we used PyYaml. It is a well established Python package to safely read info from a YAML file and convert it to a Python’s dictionary. Here is the code snippet for that.

def get_yaml_config():
    try:
        with open('.yaydoc.yml', 'r') as file:
            conf = yaml.safe_load(file)
    except FileNotFoundError:
        return {}
    return conf

The above code snippet returns a dictionary specifying all keys read from the YAML file. Since none of the options are required, we first create a dictionary with all defaults and recursively merges it with the yaml dict. The merging is done using the following code snippet:

for key, value in head.items():
    if isinstance(base, dict):
        if isinstance(value, dict):
            base[key] = update_dict(base.get(key, {}), value)
        else:
           base[key] = head[key]
    else:
        base = {key: head[key]}
return base

Now you can create a .yaydoc.yml file in the root of your repository and yaydoc would read options from there. Here is a sample yml file.

metadata:
  author: FOSSASIA
  projectname: Yaydoc
  version: development

build:
  doctheme: fossasia_theme
  docpath: docs/
  logo: images/logo.svg
  markdown_flavour: markdown_github

publish:
  ghpages:
    docurl: yaydoc.fossasia.org

It should be noted that the layout of the file may change in the future as the project is in active development.

Resources