Final Report | Stage 7 | The final report

Anand Panchbhai | A Neural QA Model for DBpedia | GSoC’19


Welcome to the final report of my 2019 GSoC project, If you are a newbie and want to know more about the journey of this project- do read it from the very begining else lets dive straight in.

If you wanna have a look at code please find it in the following link: Github Repository:

The whole work was added as a single pull request whose link is as follows:

You can contact me at: panchbhai1969[at]gmail[dot]com

This page is 3 divided into parts:

The Meeting Documents that was maintained for the whole duration of GSoC project can be accessed through: Minutes of the Meeting

We will try to keep it short and simple, lets begin.



With booming amount of information being continuously added to the internet, organising the facts becomes a very difficult task. Currently DBpedia hosts billions of such data points and corresponding relations in the RDF format.

Extracting data from such data sources requires a query to be made in SPARQL and the response to the query is a link that contains the information pertaining to the answer or the answer itself.

Accessing such data is difficult for a lay user, who does not know how to write a query. This proposal tries to built upon a System :(​ ​) — which tries to make this humongous linked data available to a larger user base in their natural languages(now restricted to English) by improving, adding and amending upon the existing codebase.

The primary objective of the project was to be able to translate any natural language question to a valid SPARQL query.

Stage Wise Explaination

The whole project was divided into 7 stages according to the proposal submitted. The stage structure is maintained for the ease of grasping the movement of the project through the timeline:

Stage 0 | Community bonding period (May 6 - 27, 2019)

To understand the current code base in detail, ponder upon all possible improvements and discuss with the mentors.

Coding period (May 27, 2019 - August 19, 2019)

Stage 1 | Improvements (Based on current state of research): (May 27, 2019 - June 4, 2019)

The first stage will mainly focus on fixing all issues in the code base to create a proper playing ground for future research endeavour that the project intends to take.

Stage 2 | Where do we stand today? (June 5, 2019 - June 10, 2019)

This stage will shed a light on where we stand and forge a concrete path this project will take. (as according to Aman’s blog, extensive work couldn’t be done in compositionality for complex QA because of the time constraints of the project).

Stage 3 | Generalised question making framework for compositionality (June 10, 2019 - June 23, 2019)

Stage 4 | Let’s make it all natural (June 28, 2019 - July 5, 2019)

Making questions more natural, it was a rather interesting question. I used a mechanism similar to page rank used by google.

Evaluation 1: June 24 - 28, 2019

Stage 5 | Finishing Question Making (July 6, 2019 - July 21, 2019)

This stage will be the last stage that tries to address the problems related to template generation for simple and complex QA.

Evaluation 2: July 22 - 26, 2019

Stage 6 | The Grid Search (July 27, 2019 - August 10, 2019)

Evaluating the performance of the model by tweaking the attributes for the NMT model to give maximum performance using the training dataset generated in the previous stages.

Stage 7 (August 11, 2019 - August 18, 2019)

Students Submit Code and Final Evaluations: August 19 - 26, 2019

Future aspects of this project

Index Page