Structural Mapping between Natural Language Questions and SQL Queries

Giordani, Alessandra

doi:10.15168/11572_368386

A core problem in data mining is to retrieve data in an easy and human friendly way. Automatically translating natural language questions into SQL queries would allow for the design of effective and useful database systems from a user viewpoint. In this thesis, we approach such problem by carrying out a mapping between natural language (NL) and SQL syntactic structures. The mapping is automatically derived by applying machine learning algorithms. In particular, we generate a dataset of pairs of NL questions and SQL queries represented by means of their syntactic trees automatically derived by their respective syntactic parsers. Then, we train a classifier for detecting correct and incorrect pairs of questions and queries using kernel methods along with Support Vector Machines. Experimental results on two different datasets show that our approach is viable to select the correct SQL query for a given natural language questions in two target domains. Given that preliminary results were encouraging we implemented an SQL query generator that creates the set of candidate SQL queries which we rerank with a SVM-ranker based on tree kernels. In particular we exploit linguistic dependencies in the natural language question and the database metadata to build a set of plausible SELECT, WHERE and FROM clauses enriched with meaningful joins. Then, we combine all the clauses to get the set of all possible SQL queries, producing candidate queries to answer the question. This approach can be recursively applied to deal with complex questions, requiring nested sub-queries. We sort the candidates in terms of scores of correctness using a weighting scheme applied to the query generation rules. Then, we use a SVM ranker trained with structural kernels to reorder the list of question and query pairs, where both members are again represented as syntactic trees. The f-measure of our model on standard benchmarks is in line with the best models (85% on the first question), which use external and expensive hand-crafted resources such as the semantic interpretation. Moreover, we can provide a set of candidate answers with a Recall of the answer of about 92% and 96% on the first 2 and 5 candidates, respectively.}

Structural Mapping between Natural Language Questions and SQL Queries / Giordani, Alessandra. - (2012), pp. 1-138. [10.15168/11572_368386]

Structural Mapping between Natural Language Questions and SQL Queries

Giordani, Alessandra

2012-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di esame finale/Defended on
	
				2012
			
	Ciclo
	
				XXII
			
	Anno Accademico
	
				2011-2012
			
	Dipartimento
	
				Ingegneria e Scienza dell'Informaz (cess.4/11/12)
			
	Corso di dottorato
	
				Informatica e telecomunicazioni (fino a.a. 2020-21, 36° ciclo)
			
	Supervisore/Relatore di tesi Unitn (Unitn internal supervisor)
	
				Moschitti, Alessandro
			
	Tesi in cotutela (Bi-nationally supervised Doctoral Thesis)
	
				no
			
	Codice DOI
	
				https://dx.doi.org/10.15168/11572_368386
			
	Lingua (Language)
	
				Inglese
			
	Settori scientifico-disciplinari (validi fino a 24/06/2024) - Reference SSD (valid until 24/06/2024)
	
				Settore INF/01 - Informatica
			
	Appare nelle tipologie:
	
				08.1 Tesi di dottorato (Doctoral Thesis)

File in questo prodotto:

File	Dimensione	Formato
phd-thesis.pdf accesso aperto Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 3.7 MB Formato Adobe PDF Visualizza/Apri	3.7 MB	Adobe PDF	Visualizza/Apri