Week Six | Stage 3 | Code Crusade Week

Well, Well, Well, A properly spent week I would say. A lot of coding and a lot of things to share here. Lets dive straight into it, Savvy!

Savvy

After last weeks realisation I spent a large part of my week coding, commiting and learning. These were a few things that i was supposed to finish. Some of them were lying around for some time and it was time to take actions:


Generating domain independent templates to minimize burden on the end user for both complex and simple QA

Note ALl links here are with respect to repository: https://github.com/dbpedia/neural-qa/tree/bd65938173367126f69d4d1bd04d5644be6572ce

In my proposal, I had laid down a tentative pseudo code to be followed and this week I implemented a hefty part of it, let’s dive into the nitty gritty:

Generate URL

The first task was to create a URL generator, the main aim was to create something related to the following:

generate_url​ (namespace,class){

       - This function generates a proper url to extract information about class, properties and
       data types from the DBpedia database.
       (Like:​ http://mappings.dbpedia.org/server/ontology/classes/Place​ )

       - It returns a url string to the calling function.
}
http://mappings.dbpedia.org/server/ontology/classes/Person

Example:

python generate_url.py --label person
http://mappings.dbpedia.org/index.php/OntologyClass:Person
http://mappings.dbpedia.org/server/ontology/classes/Person

Get Properties

Once the generate_URL page returned the URL of the properties page of the given entity, the function detached the page and extracted the name, label, range and domain of the property.

get_properties​ (url){
   - This function takes in a url or name of the namespace and class as input like
   get_properties.py code in the current
   codebase(​ https://github.com/AKSW/NSpM/blob/master/gsoc/aman/get_properties.py
   this function takes only a url only).
   -  This code on execution creates a csv which contains all the properties, ontology,
   class related information and data types as field values in each row.
   -  This function also returns a 2D list of the information mentioned above to the calling
   function.
}

Example:

http://mappings.dbpedia.org/server/ontology/classes/Person

Corresponding page: person_properties

Part of extracted properties:

Name Label Domain Range
achievement achievement Person owl:Thing
activeYears active years Person xsd:string
activeYearsEndDateMgr active years end date manager Person xsd:string

Sentence and template generator

Well, this is a tricky and cool one. Let’s dive straight into it.

With the help of the 2 functions defined previously we get the following information about the given label:

Now it’s time to brew some question, SPARQL query and query get entities to fill the templated to be used by the NSPM model. To give you a brief of what each of these terms mean let me give you an example:

a
http://dbpedia.org/resource/Andreas_Ekberg
http://dbpedia.org/resource/Danilo_Tognon
http://dbpedia.org/resource/Lorine_Livington_Pruette
http://dbpedia.org/resource/Megan_Lawrence
http://dbpedia.org/resource/Nikolaos_Ventouras
http://dbpedia.org/resource/Sani_ol_molk
http://dbpedia.org/resource/Siniša_Žugić
http://dbpedia.org/resource/William_Bagot,_2nd_Baron_Bagot
http://dbpedia.org/resource/Witold_Gerutto
http://dbpedia.org/resource/Abdoullah_Bamoussa
http://dbpedia.org/resource/Abdul_Waheed_(field_hockey)
http://dbpedia.org/resource/Abdulaziz_Alshatti
http://dbpedia.org/resource/Abdulrahman_Al-Faihan
http://dbpedia.org/resource/Ahmad_Alafasi
http://dbpedia.org/resource/Anatoliy_Abdula
http://dbpedia.org/resource/Antun_Herceg
http://dbpedia.org/resource/Astrit_Hafizi
http://dbpedia.org/resource/Bojan_Pandžić
http://dbpedia.org/resource/Briana_Provancha
http://dbpedia.org/resource/Courtney_Okolo
http://dbpedia.org/resource/Daryl_Homer
http://dbpedia.org/resource/Edward_Ling
http://dbpedia.org/resource/Francesco_Boffo
http://dbpedia.org/resource/Francesco_Brici
http://dbpedia.org/resource/Franciszek_Szymura

This list is used by NsPM model to create queries with given template like:

Label Value
Natural language question When is the birth date of olga kurylenko ?
SPARQL query select ?x where { dbr:Olga_Kurylenko dbo:birthDate ?x }

With these aspects clear, another point to be noted is the structure of the output file to be compatible with the NsPM model (; seperated):

Ontology 1 Cell 2 Ontology 2 (verify) Natural Language query Template SPARQL query template Compatible entities fetcher template
dbo:Person     When is the birth date of ? select ?x where { dbo:birthDate ?x } select distinct(?a) where { ?a dbo:birthDate [] }

Generating the queries:

For natural language question:

Now for generating viable natural language question, we divide a question into 3 parts:

question_starts_with[number]+prop[1]+ suffix

Example:

The natural language question template comes out to be:

When is the  birth date of <A> ?

For SPARQL query:

Now for generating viable SPARQL query, we divide a query into 3 parts:

(query_starts_with[number]+"where { <A>  "+ query_suffix + prop_link  +" ?x "+ query_ends_with[number])

Example:

The SPARQL Query template is of the form:

select ?x where { <A>  dbo:birthDate ?x }

Compatible entities fetcher template

We are about to wander in a bit interesting areas, stride carefully. This template generation depends on a few conditions:

if(query_suffix==""):
       query_answer = ("select distinct(?a) where { ?a "+prop_link+" []  } ")
   else :
       query_answer = ("select distinct(?a) where { ?a "+query_suffix.split(" ")[0]+" [] . ?a  "+query_suffix +" "+ prop_link +" ?x } ")

No new variables are used for this part, I will just give an example for the first case here the second one is relevant in recursive questions and query generation.

Example:

The templates comes out to be as:

select distinct(?a) where { ?a dbo:birthDate []  }

Recursive question-query generation:

There is one more important variable we have to look into before moving forward, it keeps track of the variable name used in SPARQL queries in each iteration. Count updates accordingly in the later part of the code:

if(count == 0):
       variable = "?x"
   else:
       variable = "?x"+ str(count)

To generate recursive query we just update some of the variables before going into the next loop:

query_suffix = prop_link + " "+variable+" . "+variable+" "
suffix = " of "+ prop[1] +" of <A> ?"

The ordering is made in this manner to make proper sense, please try running some SPARQL queries to know why the variables are updated as such.

Example:


All these functions are then properly wrapped in a function called generate_templates.py. More work will be done to improve this after the next meeting.

Fixing the issues with GERBIL

Finalising the REST Server

Push all the changes to the repository properly

Pull request with all the updates was made: https://github.com/dbpedia/neural-qa/pull/14#issue-288853232

# Update 17th June 2019
## Stage 0
### Community Bonding period fixes
- #12  #9
## Stage 1 | 2 | 3 (Refer to Anand's proposal for reference)
###  gsoc/ folder
- Aman's pipeline was streamlined and automated
- All the inconsistencies and the flow works as intended
- The whole code in gsoc/anand was made python 3 compatible
- running logs can be found in the data section of working-gsoc-anand branch
- Full automation of pipeline_3
- Automatic retrieval of URLs and properties built
- Whole structure of proposed pipeline was completed
- Recursive question SPARQL generation completed
- Query Check done
- Answer query done
- Multiple type of question based on entity range done: more will be added
- Testing on nmt Model done with BLEU > 95 (log present in working-gsoc-anand branch)
### NMT sub-module updated to current master:  Updated to 0be864257a76c151eef20ea689755f08bc1faf4e
- #11

Update the blogs

Updated: https://anandpanchbhai.com/A-Neural-QA-Model-for-DBpedia/WeekFive.html

Index Page