WeekSeven | Stage 3 | Looking Beyond: making questions more human

Pretty cool week! Here are the tasks I was upposed to accomplish this week:

Fixing the issues with GERBIL & F1 Score

Boy!, this was a task that I had to carry on for weeks as undone in my meeting. Finally it is completed and working like a charm, it’s a bit slow but working!


Apart from that here is a poratl I made for Q&A, its pretty nice. I don’t have a static public IP will try to host it somewhere:


More on compositionality based on the mentor’s suggestions

Brain Storming

This came out to be an interesting week with me diving in deep in thoughts regarding what methods to use to tackle the problem that arised in the previous meeting, which was:

The question seem to be useful but, they are at times nonsensical and grammatically incorrect. Thus they donot represnt real questions that the end system will encounter.

So, it was time to get back to analysing what could be done!, lets first bring all the ideas that came up in the meeting together:


Some of the suggestions given by the mentors in the previous meeting were:

I also remember thar the previous developer’s proposal had some mechanism through which he was trying to get the importance of the entities, When I read his proposal again. I came across these points which seemed useful:

The most important of all were some links posted by Tommaso Soru in slack:

this is a list of KGE approaches: https://gist.github.com/mommi84/07f7c044fa18aaaa7b5133230207d8d4

this is a pre-trained embedding model for DBpedia 2016-04: https://zenodo.org/record/1320038#.XQfS4XvTV25

Though I was not able to download the accompanying dataset in the second one, the supporting paper was really useful. The paper talked about biased walks in knowlodge graphs. The page rank based model could deliver good performance. This triggered that that is exactly what I needed.

I started digging deeper into how page ranking was done and read some really cool paper like: The Anantomy of Large-Scale Hypertextual Web Search Engine - by Sergey Brin and Lawrence Page for understanding page ranking and other papers which included: PageRank on Wikipedia: Towards General Importance Score for Entities Andreas et.el. There methodology described there was very useful and the reference dataset section of the paper further directed me to SubjectiveEye3D.


Paul Houle aggregated the Wikipedia page views of the years 2008 to 2013 with different normalization factors (particularly considering the dimensions articles, language, and time). As such, Subjec- tiveEye3D reflects the aggregated chance for a page view of a specific article in the interval years 2008 to 2013. However, similar to unnormalized Page- Rank, the scores need to be interpreted in relation to each other (i.e., the scores do not reflect a proper probability distribution as they do not add up to one).

By going through many iterations of experiment I came down to one method of ranking that seemed most feasible and could generate visible results, that methodology was based on principals similar to SubjectiveEye3D:

By going through many iterations of experiment I came down to one method of ranking that seemed most feasible and could generate visible results:

Hypothesis: Relevance of template can be determined by the popularity of the corresponding answers.

I proposed a ranking mechanism similar to the google page Rank for this purpose.

Using this hypothesis I moved on to checking the viability of above said statement. The question of whiich question sound natural and which doesn’t to a human might depend on the page view data, which are a good way of measuring the relevancy of the given topic in the current world scenario. Here are the entities that occupy the first 50 ranks of the ranking generated by SubjectiveEye3D in an olde dataset:

SUB first 50 Ranks

In my opinion that does sound like it could capture the current trends very well :D. I did ponder upon other methods but this appliation of SubjectiveEye3D seemed like the best fit for my use case.


So basically what I did was this:

I downloaded the resulting ranking of DBedia entities from SubjectiveEye3D’s github repository. As SubjectiveEye only contains DBpedia entities the compatibility worked out well. Enteries in the downloaded file were in this format:

Entity Relation Rank
http://dbpedia.org/resource/!!! http://rdf.basekb.com/public/subjectiveEye3D “1.3698837E-4”^^http://www.w3.org/2001/XMLSchema#float .
http://dbpedia.org/resource/!!!_(album) http://rdf.basekb.com/public/subjectiveEye3D “1.4383285E-5”^^http://www.w3.org/2001/XMLSchema#float .
http://dbpedia.org/resource/!! http://rdf.basekb.com/public/subjectiveEye3D “1.3581338E-5”^^http://www.w3.org/2001/XMLSchema#float .

Psuedo Code

So here is what was done: (https://github.com/dbpedia/neural-qa/tree/73937ac9e78382a27f3ba15b3aa8fae07c5f153b)

    def rank_check(query,diction,count,orignal_count):
        query_original = query
        count = orignal_count-count
        ques = " "
        for value in range(count):
            if(value == 0):
                ques = ques+"?x "
                ques = ques+"?x"+str(value+1)+" "
        query = query.replace("(?a)","(?a)"+ ques) + " order by RAND() limit 100"
        query = urllib.parse.quote(query)
        url = "https://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query="+query+"&format=text%2Fhtml&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&timeout=30000&debug=on&run=+Run+Query+"
        page = urllib.request.urlopen(url)
        soup = BeautifulSoup(page, "html.parser")
        total = len(soup.find_all("tr"))
        accum = 0
        for rows in tqdm(soup.find_all("tr")):
            for td in rows.find_all("a"):
                denom = 0 
                interaccum = 0
                for a in td:
                    if(a in diction.keys()):
                    """ print (a.get_text())
                    if(a.get_text() in diction.keys()):
                        print(diction(a.get_text())) """
                    interaccum = interaccum/denom
        return float(accum/total)


The results were the most fascinating part, here is a small part of it. The color in the G column rows signify the degree to what the given question was percieved natural by the proposed algorithm. The true and false value were based on if the ranks was greater than or less than 0.000050026261026025 (Which was decided based on one of the rank of a very valid question: What is the college of <A> ?). further work need to be done to determine threshold, areas like fuzzy threshold might be explored for greater insight:

Ranking Results

An important observation that I saw was that the ranking when done the threshold should be decided classwise and thus a single general threshold should not be used. Because the pages related to biology are seen fairly less than pages related to celebrities. But the relative number of views withing biology related pages is useful to us. Here is an example of another class:

Ranking Results Eukaryotes

Change Log:

Index Page