Handling Errors While Parsing the yaml File in Yaydoc
Yaydoc, our automatic documentation generator uses a yaml file to read a user’s configuration. The internal configuration parser basically converts the yaml file to a python dictionary. Then, it serializes the values of that dictionary using a custom serialization format. From there it associates those values with environment variables which are then passed to bash scripts for various tasks such as deployment, generation, etc.. Some of those environment variables are again passed to another python layer which interacts with sphinx where they are deserialized before use. This whole system works pretty well for our use cases.
Now let’s assume a user adds a yaml file where they have a malformed section in the file. For example, to specify a theme, one needs to add the following to the yaml file.
build: theme: name: sphinx_fossasia_theme
But our user has the following in their yaml file.
build: theme: sphinx_fossasia_theme
Now this will raise an error as we expect a dictionary as a value for the key ‘theme’ but we got a string. Now how do we handle such cases without ignoring the entire file as that would be too much of a penalty for such a small mistake? One approach would have been to wrap each call to connect with a bunch of try-catch but that would render the code unreadable as the initial motivation for implementing the connect method was to abstract the internal implementation so that other contributors who may not be well versed with python can also easily add config options without needing to learn a bunch of python constructs.
So, what we did was that, while merging the dictionary containing default options and the dictionary containing the user preferences, we check whether the default has the same data type as that of the incoming value. If they are, It’s deemed safe to merge. There are certain relaxations though, like if the current type is a list, then the incoming value can be of any time as that can always be converted to a list of a single element. This is required to support the following syntax.
key: - value
key: value
The above two blocks are equivalent due to the above-mentioned approach although the type is different.
Now, after this pre-validation step is over we can ensure that the if the assumed type for a key is let’s say a dictionary, then it would be a dictionary. Hence no type errors would be raised like trying to access a dict method for another object, say a string which happened with the earlier implementation. After this, an extra parameter was added to the connect method to which we can now pass a validation function which if returns false, those values would be ignored. Usage of this feature has been implemented to a small level where we validate the links to subprojects and if they look like a valid github repo only then will they be included. Note that their existence is not checked. Only a regex based validation is performed.
It was also important to notify the user about these events when we detect that a specific section is invalid and provide informative and helpful error messages without failing the build. Hence proper error messages were also added which were informative so that the user knows exactly which section is to blame. This is similar to compilers where the error message is crucial to debug a certain piece of code.
Resources
- How to recursively merge python dictionaries – Stackoverflow post by bscan