Getting SUSI Skill at a Commit ID

Susi Skill CMS is a web app to edit and create new skills. We use Git for storing different versions of Susi Skills. So what if we want to roll back to a previous version of the skill? To implement this feature in Susi Skill CMS, we needed an API endpoint which accepts the name of the skill and the commit ID and returns the file at that commit ID.

In this blog post I will tell about making an API endpoint which works similar to git show.

First we will accept all the request parameters from the GET request.

        String model_name = call.get("model", "general");
        String group_name = call.get("group", "Knowledge");
        String language_name = call.get("language", "en");
        String skill_name = call.get("skill", "wikipedia");
        String commitID  = call.get("commitID", null);

In this we get the model name, category, language name, skill name and the commit ID. The above 4 parameters are used to make a file path that is used to find the location of the skill in the Susi Skill Data repository.

This servlet need CommitID to work and if commit ID is not given in the request parameters then we send an error message saying that the commit id is null and stop the servlet execution.

    Repository repository = DAO.getRepository();
    ObjectId CommitIdObject = repository.resolve(commitID);

Then we get the git repository of the skill from the DAO and initialize the repository object.

From the commitID that we got in the request parameters we create a CommitIdObject.

   (RevWalk revWalk = new RevWalk(repository)) {
   RevCommit commit = revWalk.parseCommit(CommitIdObject);
   RevTree tree = commit.getTree();


Now using commit’s tree, we will find the find the path and get the tree of the commit.

From the TreeWalk in the repository we will set a filter to find a file. This searches recursively for the files inside all the folders.

                revWalk = new RevWalk(repository)) {
                try (TreeWalk treeWalk = new TreeWalk(repository)) {
                    treeWalk.addTree(tree);
                    treeWalk.setRecursive(true);
                    treeWalk.setFilter(PathFilter.create(path));
                    if (!treeWalk.next()) {
                        throw new IllegalStateException("Did not find expected file");
                    }

If the TreeWalk reaches to an end and does not find the specified skill path then it returns anIllegal State Exception with an message saying did not found the file on that commit ID.

       ObjectId objectId = treeWalk.getObjectId(0);
       ObjectLoader loader = repository.open(objectId);
       OutputStream output = new OutputStream();
       loader.copyTo(output);

And then one can the loader to read the file. From the treeWalk we get the object and create an output stream to copy the file content in it. After that we create the JSON and put the OutputStream object as as String in it.

       json.put("file",output);

This Servlet can be seen working api.susi.ai: http://api.susi.ai/cms/getFileAtCommitID.json?model=general&group=knowledge&language=en&skill=bitcoin&commitID=214791f55c19f24d7744364495541b685539a4ee

Resources

JGit Documentation: https://eclipse.org/jgit/documentation/

JGit User Guide: http://wiki.eclipse.org/JGit/User_Guide

JGit Repository access: http://www.codeaffine.com/2014/09/22/access-git-repository-with-jgit/

JGit Github: https://github.com/eclipse/jgit

Continue ReadingGetting SUSI Skill at a Commit ID

Implementing revisioning feature in Open Event

{ Repost from my personal blog @ https://blog.codezero.xyz/implementing-revisioning-feature-in-open-event }

As I said in my previous blog post about Adding revisioning to SQLAlchemy Models,

In an application like Open Event, where a single piece of information can be edited by multiple users, it’s always good to know who changed what. One should also be able to revert to a previous version if needed.

Let’s have a quick run through on how we can enable SQLAlchemy-Continuum on our project.

  1. Install the library SQLAlchemy-Continuum with pip
  2. Add __versioned__ = {} to all the models that need to be versioned.
  3. Call make_versioned() before the models are defined
  4. Call configure_mappers from SQLAlchemy after declaring all the models.

Example:

import sqlalchemy as sa  
from sqlalchemy_continuum import make_versioned

# Must be called before defining all the models
make_versioned()

class Event(Base):

    __tablename__ = 'events'
    __versioned__ = {}  # Must be added to all models that are to be versioned

    id = sa.Column(sa.Integer, primary_key=True, autoincrement=True)
    name = sa.Column(sa.String)
    start_time = sa.Column(db.DateTime, nullable=False)
    end_time = sa.Column(db.DateTime, nullable=False)
    description = db.Column(db.Text)
    schedule_published_on = db.Column(db.DateTime)

# Must be called after defining all the models
sa.orm.configure_mappers()

We have SQLAlchemy-Continuum enabled now. You can do all the read/write operations as usual. (No change there).

Now, for the part where we give the users an option to view/restore revisions. The inspiration for this, comes from wordpress’s wonderful revisioning functionality.

The layout is well designed. The differences are shown in an easy-to-read form. The slider on top makes it intuitive to move b/w revisions. We have a Restore this Revision button on the top-right to switch to that revision.

A similar layout is what we would like to achieve in Open Event.

  1. A slider to switch b/w sessions
  2. A pop-over infobox on the slider to show who made that change
  3. A button to switch to that selected revision.
  4. The colored-differences shown in side-by-side manner.

To make all this a bit easier, SQLAlchemy-Continuum provides us with some nifty methods.

count_versions is a method that allows us to know the number of revisions a record has.

event = session.query(Event).get(1)  
count = count_versions(event)  # number of versions of that event

Next one is pretty cool. All the version objects have a property called as changeset which holds a dict of changed fields in that version.

event = Event(name=u'FOSSASIA 2016', description=u'FOSS Conference in Asia')  
session.add(article)  
session.commit(article)

version = event.versions[0]  # first version  
version.changeset  
# {
#   'id': [None, 1],
#   'name': [None, u'FOSSASIA 2016'],
#   'description': [None, u'FOSS Conference in Asia']
# }

event.name = u'FOSSASIA 2017'  
session.commit()

version = article.versions[1]  # second version  
version.changeset  
# {
#   'name': [u'FOSSASIA 2016', u'FOSSASIA 2017'],
# }

As you can see, dict holds the fields that changed and the content the changed (before and after). And this is what we’ll be using for generating those pretty diffs that the guys and girls over at wordpress.com have done. And for this we’ll be using two things.

  1. A library named diff-match-patch. It is a library from Google which offers robust algorithms to perform the operations required for synchronizing plain text.
  2. A small recipe from from code.activestate.com Line-based diffs with the necessary HTML markup for styling insertions and deletions.
import itertools  
import re

import diff_match_patch

def side_by_side_diff(old_text, new_text):  
    """
    Calculates a side-by-side line-based difference view.

    Wraps insertions in <ins></ins> and deletions in <del></del>.
    """
    def yield_open_entry(open_entry):
        """ Yield all open changes. """
        ls, rs = open_entry
        # Get unchanged parts onto the right line
        if ls[0] == rs[0]:
            yield (False, ls[0], rs[0])
            for l, r in itertools.izip_longest(ls[1:], rs[1:]):
                yield (True, l, r)
        elif ls[-1] == rs[-1]:
            for l, r in itertools.izip_longest(ls[:-1], rs[:-1]):
                yield (l != r, l, r)
            yield (False, ls[-1], rs[-1])
        else:
            for l, r in itertools.izip_longest(ls, rs):
                yield (True, l, r)

    line_split = re.compile(r'(?:r?n)')
    dmp = diff_match_patch.diff_match_patch()

    diff = dmp.diff_main(old_text, new_text)
    dmp.diff_cleanupSemantic(diff)

    open_entry = ([None], [None])
    for change_type, entry in diff:
        assert change_type in [-1, 0, 1]

        entry = (entry.replace('&', '&amp;')
                      .replace('<', '&lt;')
                      .replace('>', '&gt;'))

        lines = line_split.split(entry)

        # Merge with previous entry if still open
        ls, rs = open_entry

        line = lines[0]
        if line:
            if change_type == 0:
                ls[-1] = ls[-1] or ''
                rs[-1] = rs[-1] or ''
                ls[-1] = ls[-1] + line
                rs[-1] = rs[-1] + line
            elif change_type == 1:
                rs[-1] = rs[-1] or ''
                rs[-1] += '<ins>%s</ins>' % line if line else ''
            elif change_type == -1:
                ls[-1] = ls[-1] or ''
                ls[-1] += '<del>%s</del>' % line if line else ''

        lines = lines[1:]

        if lines:
            if change_type == 0:
                # Push out open entry
                for entry in yield_open_entry(open_entry):
                    yield entry

                # Directly push out lines until last
                for line in lines[:-1]:
                    yield (False, line, line)

                # Keep last line open
                open_entry = ([lines[-1]], [lines[-1]])
            elif change_type == 1:
                ls, rs = open_entry

                for line in lines:
                    rs.append('<ins>%s</ins>' % line if line else '')

                open_entry = (ls, rs)
            elif change_type == -1:
                ls, rs = open_entry

                for line in lines:
                    ls.append('<del>%s</del>' % line if line else '')

                open_entry = (ls, rs)

    # Push out open entry
    for entry in yield_open_entry(open_entry):
        yield entry

So, what we have to do is,

  1. Get the changeset from a version
  2. Run each field’s array containing the old and new text through the side_by_side_diff method.
  3. Display the output on screen.
  4. Use the markups <ins/> and <del/> to style changes.

So, we do the same for each version by looping through the versions array accessible from an event record.

For the slider, noUiSlider javascript library was used. Implementation is simple.

<div id="slider"></div>

<script type="text/javascript">  
    $(function () {
        var slider = document.getElementById('slider');
        noUiSlider.create(slider, {
            start: [0],
            step: 1,
            range: {
                'min': 0,
                'max': 5
            }
        });
    });
</script>

This would create a slider that can go from 0 to 5 and will start at position 0.

By listening to the update event of the slider, we’re able to change which revision is displayed.

slider.noUiSlider.on('update', function (values, handle) {  
    var value = Math.round(values[handle]);
    // the current position of the slider
    // do what you have to do to change the displayed revision
});

And to get the user who caused a revision, you have to access the user_idparameter of the transaction record of a particular version.

event = session.query(Event).get(1)  
version_one = event.versions[0]  
transaction = transaction_class(version_one)  
user_id = transaction.user_id

So, with the user ID, you can query the user database to get the user who made that revision.

The user_id is automatically populated if you’re using Flask, Flask-login and SQLAlchemy-Continuum’s Flask Plugin. Enabling the plugin is easy.

from sqlalchemy_continuum.plugins import FlaskPlugin  
from sqlalchemy_continuum import make_versioned

make_versioned(plugins=[FlaskPlugin()])

This is not a very detailed blog post. If you would like to see the actual implementation, you can checkout the Open Event repository over at GitHub. Specifically, the file browse_revisions.html.

The result is,

Still needs some refinements in the UI. But, it gets the job done :wink:

Continue ReadingImplementing revisioning feature in Open Event