Grammar-based Neural Text-to-SQL Generation

Grammar-based Neural Text-to-SQL Generation

Kevin Lin      Ben Bogin      Mark Neumann
      Jonathan Berant       Matt Gardner

Allen Institute for Artificial Intelligence
School of Computer Science, Tel-Aviv University
{kevinl, markn, mattg}{ben.bogin, joberant}

The sequence-to-sequence paradigm employed by neural text-to-SQL models typically performs token-level decoding and does not consider generating SQL hierarchically from a grammar. Grammar-based decoding has shown significant improvements for other semantic parsing tasks, but SQL and other general programming languages have complexities not present in logical formalisms that make writing hierarchical grammars difficult. We introduce techniques to handle these complexities, showing how to construct a schema-dependent grammar with minimal over-generation. We analyze these techniques on ATIS and Spider, two challenging text-to-SQL datasets, demonstrating that they yield 14–18% relative reductions in error.

Grammar-based Neural Text-to-SQL Generation

Kevin Lin      Ben Bogin      Mark Neumann       Jonathan Berant       Matt Gardner Allen Institute for Artificial Intelligence School of Computer Science, Tel-Aviv University {kevinl, markn, mattg}{ben.bogin, joberant}

1 Introduction

Natural language interfaces to databases (NLIDB), the task of mapping natural language utterances to SQL queries, has been of interest to both the database and natural language processing communities, as effective NLIDB would allow people of all technical backgrounds to access information stored in relational databases.

Recent text-to-SQL models typically take a standard sequence-to-sequence modeling approach, encoding a sequence of natural language tokens and then decoding a sequence of SQL tokens, possibly constrained by the table schema or a SQL grammar in some way (Iyer et al., 2017; Yu et al., 2018a, b). However, work in the (closely related) semantic parsing literature has shown that hierarchical, grammar-based decoding, where the output of the model changes from a sequence of tokens to a sequence of productions rules from the grammar, is often more effective (Rabinovich et al., 2017; Krishnamurthy et al., 2017; Yin and Neubig, 2017).

Applying grammar-based decoding to general programming languages such as SQL is very challenging. Constructing a grammar that constrains the outputs correctly such that it cannot generate invalid programs (“over-generate”) is difficult, as the abstract syntax trees (ASTs; Aho et al., 1986) used by the languages’ compilers 111The same also applies for interpreters; we use the term compilers in this work to simplify the discussion. are not sufficiently constraining. There are trade-offs between manual effort in constructing a tight grammar, the complexity and depth of the grammar, and the learnability of the grammar to a model. These languages often define typed variables (e.g., table aliases in SQL), which mean they are not context-free, requiring more complex mechanisms to handle and making it difficult to construct a grammar that completely removes over-generation. There are often classes or schemas that need to be respected when generating (e.g., table columns like city.city_name), requiring the grammar to depend on the schema of the database being queried. With SQL, this can be taken one step further, constraining (or at least encouraging) comparisons on table columns to be values in that column (e.g., WHERE city.city_name = "New York").

In this work we develop a grammar that covers more than 98% of instances with minimal over-generation in two popular datasets: ATIS (Hemphill et al., 1990), a dataset of contextual interactions with a flight database, and Spider (Yu et al., 2018c), a dataset focused on complex SQL queries over a variety of schemas, many of which are unseen at test time. We show how to modify grammar-based semantic parsers to use this grammar, and discuss how the common practice of identifier anonymization in SQL queries applies to grammar-based decoding. Interestingly, prior grammar-based parsers have their own linking mechanism which serves largely the same purpose as identifier anonymization, and we show that these two mechanisms are complementary to each other. Finally, we note that context-sensitive grammar constraints are easily handled inside the decoder, allowing us to use a relatively simple context-free grammar and impose further constraints (e.g., on the production of joins in SQL) at run-time (both during training and inference).

We apply these contributions to models for ATIS and Spider, demonstrating the effectiveness of grammar-based decoding for text-to-SQL tasks. Our model achieves 73.7% denotation accuracy on the resplit, contextual ATIS task (Suhr et al., 2018), a 4.5% absolute improvement over the prior best result, and 33.8% accuracy on the database split of Spider, a 14.1% absolute improvement over the best prior work with the same supervision.

2 SQL Grammar

In this section we discuss several important considerations when designing a grammar for a general programming language like SQL, and we present the grammar that we use in our experiments.

When doing grammar-based decoding on a programming language, one obvious potential starting place, which has been used repeatedly in prior work, is to directly use a compiler’s grammar and the ASTs it produces (Yin and Neubig, 2017; Iyer et al., 2018). This approach, while simple and intuitive, has several drawbacks. First, these grammars are written to recognize and parse presumed-valid programs, and further checking is done by the compiler after the ASTs are produced. This means that using the compiler’s grammar for grammar-based decoding will significantly over-generate programs, still requiring a semantic parser to learn which of the possible programs that it can produce are actually valid programs. Second, these grammars are also typically very deep, with many intermediate non-terminals and unary productions that lead to very long derivations for simple programs. It is easier for a semantic parser to learn to produce shorter derivations, so a shallower grammar would be preferable.

The main issue that leads a compiler’s grammar to over-generate in a semantic parser is that a programming language is not context free, while the compiler’s grammar for it generally is. The context-sensitive parts of a programming language revolve around variables, their definitions, and their use. A variable can have user- or schema-defined types, which restrict the identifiers that are validly used in conjunction with it. For example, a class in python would only have a limited set of member variables and functions, and a SQL identifier referring to a table in a database, such as city, only has a limited set of column identifiers that can be used with it, such as city.city_name.

We address these issues for SQL by designing shallow parsing expression grammars (Ford, 2004) that capture the minimum amount of SQL necessary to cover most of the examples in a given dataset. Limiting the SQL covered to only what is necessary allows the grammar to be more compact, which aids the learnability of the grammar for the semantic parser. Unfortunately, this means that the grammars we write are dataset-specific, though we share a common base that needs only minimal modification for a new dataset. A simplified base grammar is shown in Figure 1. 222The full grammar and code for reproducing the experiments are in AllenNLP (Gardner et al., 2018)

Figure 1: The base SQL grammar before augmentation with schema specific and utterance specific rules. The non-terminals are shown in boxes.

In order to handle the context-sensitive components of SQL, we use two approaches. First, we note that some amount of context sensitivity can be handled by adding additional non-terminals to a context-free grammar (c.f. Petrov et al. (2006)), and we use this approach to ensure consistency of table, column, and value references. Second, for more complex context sensitivity, such as ensuring that joined tables in a SQL query share a common foreign key, we use runtime constraints on production rule expansion (during decoding at both training and inference time) to ensure that only valid programs are allowed (c.f. Liang et al. (2017)).

Adding schema non-terminals: In the base grammar, the table_name and col_ref non-terminals are left undefined. For each example, both during training and inference, we examine the database schema associated with that example and automatically add grammar rules for these non-terminals. All tables in the database have their names added as valid productions of the table_name non-terminal, and each column in each table gets a production for the col_ref non-terminal that generates the table and column name together (e.g., city.city_name). We further only allow comparisons to table columns with values that occur in that column. For example, in a WHERE clause, we only allow statements such as city.city_name = VALUE where VALUE is actually a value in the city.city_name column in the database. We accomplish this by modifying the biexpr non-terminal to have one possible production for each table column, making use of a new non-terminal for values in that column. An example of each of these kinds of rules is shown in Figure 2. Note that an compiler’s grammar would allow arbitrary identifiers in these conditions; in order to properly constrain the productions allowed by the semantic parser in a given context, we need to add these schema-dependent production rules.

Figure 2: Example of additional rules added to the base SQL grammar based on database schema and entities in utterance if the entities WESTCHESTER COUNTY, DETROIT, and 1701 are detected in the utterance

The binary comparison rule mentioned above, restricting column comparisons to only accept values in the corresponding column, is occasionally too strict. If our input utterance mentions “flights before 5:01pm”, we want to be able to have a clause like WHERE flight.departure_time < 1701. In order to handle cases like this, we additionally examine the input utterance and dynamically add rules to the grammar based on values seen there. These are largely based on heuristic detection of numbers and times in the input. This is also shown in Figure 2.

Both of these mechanisms for dynamically producing production rules, either from the database schema or from the input utterance, can generate rules at test time that were never seen during training. In order to handle this, we distinguish between global rules that come from the base grammar, and linked rules that are dynamically generated. These two kinds of rules will be parameterized differently in the model (§3), so that the model can handle unseen rules at test time.

Run-time Grammar Constraints: For datasets that involve joins we apply additional constraints at run-time. To start off, we keep track of two sets of tables: used tables, , and required tables, . When a table is SELECTed or JOINed, it is added to the set of used tables, and when a column is SELECTed, the table that it belongs to is added to a set of required tables. . First, when generating WHERE, ORDER BY, GROUP BY, and JOIN conditions, we eliminate rules that generate columns that are not from the set of used tables. Second, when predicting the last join, if there exists a table in not in , we remove all rules that do not join . Third, we constrain the number of joins using the used tables and required tables. If , then there must be more joins so we remove all rules that stop joins. If , then we do not allow rules that generate more than one join, since we are assuming no self-joins.

Other Considerations: Many current text-to-SQL datasets in the NLP community make liberal use of table aliases when they are not strictly necessary. These aliases give traditional sequence-to-sequence models some consistency when predicting output tokens, but unnecessarily complicate the grammar in grammar-based decoding. It makes the grammar deeper, and requires the parser to keep track of additional identifiers that are hard to model. Accordingly, we simply undo the table alias normalization that has been done in these datasets before training our model, and add it back in during post-processing of our predicted queries if the dataset requires it. Table aliases are sometimes required in complex SQL programs, but these are very rare in current datasets, and we do not currently handle them.

Linearizing a syntax tree: Given this dynamically generated grammar for a given example, during training we parse the input SQL into an AST. Following Krishnamurthy et al. (2017), we then linearize this tree depth-first, left-to-right, to get a sequence of production rules for the parser to learn to generate. During decoding, the grammar (along with runtime constraints) is used to constrain the production rules available to the model at each timestep. An example query derivation in this grammar can be seen in Figure 3.

(a) Gold SQL label
statement -> [query, ”;”]
query -> [”(”, SELECT”, distinct, select_results, FROM”, table_refs, where_clause, ”)”]
distinct -> ””
select_results -> [col_refs]
(b) Gold derivation
Figure 3: An example of how gold SQL queries are transformed into gold derivations for model supervision. Derivations are formed from a depth-first traversal of the AST from the parsed statement.

3 Model

Figure 4: Overview of our type-constrained semantic-parser based on Krishnamurthy et al. (2017). The encoder links input tokens to database values and generates link embeddings which, along with word embeddings, are fed into a bidirectional LSTM. The decoder predicts a sequence of SQL grammar rules that generate a SQL query.

To translate natural language utterances to SQL statements, we pair our grammar from Section 2 with a semantic parsing model that closely follows that of Krishnamurthy et al. (2017).333The main differences with prior work are in how and when we compute linking scores, and in the identifier anonymization. Our model takes as input an utterance, a database, and an utterance-specific grammar, and outputs a sequence of production rules that sequentially build up an AST for a SQL program. The model distinguishes between two kinds of production rules: (1) global rules that come from the base grammar and are shared across utterances, and (2) linked rules that are utterance-specific and might be unseen at test time. The base grammar rules typically determine the structure of the SQL statement, while the utterance-specific rules perform linking of words in the utterance to identifiers in the database (such as table names, columns names, and column values).

Notation: The utterance is denoted as a sequence of tokens . Identifiers in the database that may be unseen at test time, such as the name of a city or an airport code in ATIS, or table and column names in Spider, are denoted as , and the whole set of identifiers is denoted as . The production rule that generates a particular identifier is denoted as .

Identifier Linking We use simple string matching heuristics to link words or phrases in the input utterance to identifiers in the database.444This heuristic string matching could easily be replaced by a learned function, and this was done in prior work that focused on WikiTableQuestions (Krishnamurthy et al., 2017), but we found that to be unnecessary for these text-to-SQL datasets. For example, if “Boston“ appears in the utterance, then it should be linked to the linked rules that produce the relevant identifiers, such as city_name_string -> "BOSTON" and city_code -> "BOS". First, we generate a linking score between the utterance token, and each identifier:

We use this linking score in both the encoder and the decoder. In the encoder, we generate a link embedding for each token in the utterance that represents what database values it is linked to. For each linked rule that generates a database identifier , we generate a type vector based on the non-terminal type of the identifier. This allows the model to handle unseen identifiers at test time. The link embedding is then computed as . The decoder section describes the use of the linking in decoding.

Encoder: The encoder is a bi-directional LSTM (Hochreiter and Schmidhuber, 1997) that takes as input a concatenation of a learned word vector and the link embedding for each token. To incorporate the history of the interaction in ATIS, we concatenate the previous utterances delimited with special tokens. The model is able to access the previous utterances but not the previous queries.

Decoder: The decoder is an LSTM with attention on the input utterance that predicts production rules from the grammar described in Section 2. At each step, the decoder iteratively builds up the SQL query by applying a grammar rule to the leftmost non-terminal in the AST. The production rules associated with any particular non-terminal could either be global rules, linked rules, or both. Global rules are parameterized with an embedding, and the model assigns logits to these rules using a multilayer perceptron. Linked rules are parameterized using the decoder’s attention over the input utterance and the linking scores mentioned earlier. At step , the decoder computes an attention over the input utterance and then computes logits for linked rules as . Logits for all rules are jointly normalized with a softmax to produce a distribution over the available production rules at each decoding step. We note that this parameterization of linked rules through the attention mechanism is a key difference from traditional sequence-to-sequence models. It is similar to a copy mechanism (Gu et al., 2016), though we are “copying” production rules that are linked to utterance tokens, not the utterance tokens themselves.

Identifier anonymization, which has long been done in text-to-SQL models, is the process of taking database identifiers that appear in both the question and SQL query and replacing them with dummy variables, to simplify the prediction task. For example, the utterance “what flights go from boston to orlando” would be preprocessed to be “what flights go from CITY_NAME_0 to CITY_NAME_1”. This anonymization has some of the same goals as our identifier linking—enabling prediction of identifiers not seen during training—but it also simplifies the encoder’s vocabulary, because all city names get removed from the vocabulary and replaced with the dummy variable. In our model, we experiment with both anonymization and linking at the same time, treating the dummy variables as linked production rules. Importantly, however, we do the anonymization using our linking heuristics only, not looking at the SQL query, so our evaluation is equivalent to a non-anonymized setting.

Training The model is given access to utterances paired with one or more corresponding SQL queries. The SQL queries are parsed into their derivations, which are used as supervision for the model. The model is then trained to maximize the log-likelihood of the labeled query. If there are multiple programs we train on only the one with the shortest derivation.

4 Experiments

Development Test
Suhr et al. (2018) 37.5 62.5 43.6 69.2
Ours 39.1 65.8 44.1 73.7
Table 1: Comparison of our model with the best prior work on the ATIS dataset. Q and D correspond to exact query accuracy and denotation accuracy. The main difference between the models is that ours uses grammar-based decoding while Suhr et al. (2018)’s is token-based.
Development Test
Yu et al. (2018b) 18.9 19.7
Ours 34.8 33.8
Table 2: Comparison of our model with the best prior work on the Spider dataset, with the exact component matching accuracy.

4.1 Datasets

We evaluate on two datasets, the ATIS flight planning dataset (Hemphill et al., 1990) and the Spider dataset (Yu et al., 2018c).

ATIS: We use Suhr et al. (2018)’s data re-split to avoid scenario bias and to make use of their pre-processing to identify times with UWTime (Lee et al., 2014). The dataset consists of 1148/380/130 train/dev/test interactions. There is an average of 7 utterances per interaction. We use Suhr et al. (2018)’s model as our baseline, as it uses the same dataset, preprocessing, and supervision, but with token-based decoding.

For evaluation, we use exact query match accuracy and denotation accuracy. Query match accuracy is the percentage of queries that have the same sequence of SQL tokens as the reference query. Denotation accuracy is the percentage of queries that execute to the same table as the gold query, where credit is not given to queries that do not execute. Denotation accuracy is a particularly important evaluation metric when considering SQL as a target language, as the ordering of various clauses do not affect query execution (Xu et al., 2017).

Spider: The key difference between Spider and other text-to-SQL datasets is that databases not seen in training can appear in the test set. In the database split, databases are split randomly into 146/20/40 train/dev/test databases. To do well on the database split, the model needs to learn to compose various SQL operators and generalize to new schemas, as all databases will be unseen at test time. To compare with prior work, we report the exact component matching score. The predicted query is decomposed by the SELECT, WHERE, GROUP BY, ORDER BY and KEYWORDS. Each component in the predicted query and the ground truth are then decomposed into subcomponents and checked if the sets of the components match exactly. The predicted query is correct when all components match.

4.2 Implementation Details

The two datasets we experiment with were designed to test different aspects of the text-to-SQL task, and thus we include different parts of the model in each dataset to address them. Our model for the ATIS dataset includes identifier anonymization—since this dataset is evaluated on execution accuracy linking tokens in the utterance to database values is extremely important. The ATIS model does not include the run-time constraints as there are no joins or table aliases in this dataset. Conversely, Spider already anonymizes database values but has many joins, so our model does not have identifier anonymization and database value generation but does include run-time constraints.

We use the sparse Adam optimizer with a learning rate of 0.001 (Kingma and Ba, 2014). We use a batch size of 32 and initial patience of 10 epochs. We use accuracy on the dev set as a metric for early stopping and hyperparameter tuning. We use uniform Xavier initialization for the weights of the LSTM and zero vectors for the biases (Glorot and Bengio, 2010). The word embeddings and identifier type embeddings are both of size 400 are not pretrained. The encoder and decoder both contain 1 layer with hidden size 800. We apply dropout with probability of 0.5 after the encoder. During training, we train on instances with derivation less than 300 steps and during inference we limit the decoder to 300 generation steps. For incorporating context on ATIS, we allow the model to see the past 3 utterance as context. During evaluation, we use beam search with a beam size of 10.

4.3 Results

Table 1 shows a comparison of our model against the best prior published result on the context-dependent ATIS dataset. Our model, which includes identifier linking, link embeddings and type-constraints, yields a 4.5% improvement in denotation accuracy over prior work. Table 2 shows a comparison of our model with previous work on the Spider dataset, and shows that it yields a 14.1% increase in exact component matching compared to the best previously published result.555Yu et al. (2018b) also present a model that gets 27.2% accuracy, but it uses additional manual annotations. Table 2 shows what is included on the official Spider leaderboard.

5 Discussion

Linking Link Embedding Anon. Acc.
Yes No No 57.1
Yes Yes No 60.4
Yes No Yes 64.1
No Yes Yes 60.6
Yes Yes Yes 65.8
Table 3: Model ablations on ATIS, ablating the linked rules during decoding (making them global rules), the link embedding, and the identifier anonymization, showing denotation accuracy on the development set.

Table 3 presents ablations of various components of the model. In the setting without identifier anonymization, the link embedding improves denotation accuracy by 3.3%. This is due to the fact that identifiers need to be accounted for not only to generate values, but also to generate the correct query structure. Figure 5 shows that model with link embeddings is able to use the type information to generate the correct query structure and values, even for identifiers that have low frequency in the dataset. In this case, even before the model generates the linked identifier, fare_basis . fare_basis_code = ‘F’, the model has to generate the correct columns in the SELECT clause and table in the FROM clause. With the link embedding, the model correctly identifies that it needs to select from the fare_basis table, while the model without the link embedding incorrectly selects the class_of_service table.

Constrained Columns Values Acc.
No No 55.8
Yes No 56.2
Yes Yes 65.8
Table 4: Grammar ablations on ATIS, ablating the column and value consistency constraints, showing denotation accuracy on the development set.
Alias Pre. Runtime Constraints Acc.
No No 29.8
Yes No 30.7
Yes Yes 34.9
Table 5: Grammar abations on Spider, ablating the preprocessing for handling table aliases and run-time constraints, showing exact component accuracy on the development set.

Table 4 presents ablations of the schema-dependent grammar constraints on the ATIS dataset. We find that adding the constraint that columns appear with the table does not significantly improve performance. This could be due to the fact that the same tables are seen during training and test in ATIS, so associating tables with columns is not as challenging. However, removing the constraint on values decreases the denotation accuracy by 9.6%, showing that generating the correct value in a WHERE clause is a central problem in this dataset. Table 5 shows that both table aliases and run-time constraints improve our model on for the Spider dataset.

SELECT DISTINCT fare_basis . fare_basis_code,
                  fare_basis . booking_class,
                  fare_basis . class_type,
                  fare_basis . premium,
                  fare_basis . economy,
                  fare_basis . discounted,
                  fare_basis . night,
                  fare_basis . season,
                  fare_basis . basis_days
 FROM fare_basis
 WHERE fare_basis . fare_basis_code = ’F’ ) ;
(a) Model with link embeddings
SELECT DISTINCT class_of_service . booking_class,
                  class_of_service . rank,
                  class_of_service . class_description
FROM class_of_service
WHERE class_of_service . booking_class = ’F’;
(b) Model without link embeddings
Figure 5: Query generated with link embeddings 4(a) that matches the gold query and without link embedding 4(b) for the input utterance “what is fare code f”

We also experimented with a production rule copy mechanism similar to that of Suhr et al. (2018). While copied production rules shorten the derivation and aid interpretability by showing which subtrees come from previous queries, we did not observe significant change in accuracy.

One final point that highlights the complexity of constructing grammars is that the ordering of recursive rules is important. For the col_refs rule in Figure 3, and similarly for JOIN clauses, switching between left or right branching can cause a several point difference in performance. Automatically determining the optimal grammar is an interesting direction for future research.

5.1 Error Analysis

Since linked rules dealing with numbers and times are only added to the grammar based on the utterance text, the grammar will only parse a query correctly when all identifiers in the query can be detected in the utterance. By manually inspecting the preprocessed utterances and gold SQL labels for ATIS, we found that UWTime identified incorrect dates in 27.6% of the unparseable queries. On the queries that can be parsed, our model performs substantially better, yielding 52% query match accuracy and 80% denotation accuracy, which suggests that improving datetime parsing could have a significant impact on performance.

In addition, we manually examined model output for 70 development set queries on the ATIS dataset. We found that 70% of errors come from either linking or missing constraints. In particular, conflating references to airport tables and city tables was the cause of many errors, as references to cities and airports in the utterance are particularly ambiguous, resulting in poor linking using string heuristics. The remaining 30% of errors stem from a variety of sources including difficulty in resolving anaphora, ambiguity in references to time, and selecting incorrect tables.

6 Related Work

Text-to-SQL: Generating SQL queries from English queries has been a longstanding challenge that has interested both the database and NLP communities (Androutsopoulos et al., 1995). More generally, semantic parsing into logical formalisms has been studied extensively in the NLP community (Zelle and Mooney, 1996; Zettlemoyer and Collins, 2005; Liang et al., 2011). A relevant line of work in semantic parsing has been treating the problem as a sequence generation task by linearizing trees (Dong and Lapata, 2016; Alvarez-Melis and Jaakkola, 2016).

Datasets: We evaluate on the Spider and ATIS datasets, two datasets that present challenges not present in other text-to-SQL datasets. Spider is the most difficult in terms of query complexity and requires generalizing to unseen databases at test time (Yu et al., 2018c). ATIS requires handling context-dependent utterances and contains a large number of tables per database. Other well studied datasets include Restaurants (Ana-Maria Popescu and Kautz, 2003), Academic (Li and Jagadish, 2014) and WikiSQL (Zhong et al., 2017). There has recently been work in standardizing the many proposed text-to-SQL datasets (Finegan-Dollak et al., 2018).

Grammar-based decoding: Semantic parsers that output production rules from a grammar instead of directly outputting tokens have been studied for other formal languages such as -DCS and general purpose programming languages (Krishnamurthy et al., 2017; Rabinovich et al., 2017; Yin and Neubig, 2017). Grammar-based methods have been explored by Yin and Neubig (2018) for WikiSQL. However, as noted by Finegan-Dollak et al. (2018), WikiSQL is composed of relatively simple SQL queries, with over half of the queries of the form (SELECT col AS result FROM table WHERE col = value), and can be parsed by just 4 grammar rules. The work most similar to ours is Yu et al. (2018b), which also exploits a SQL-specific grammar to constrain the output by structuring it as a set of recursive modules. However, they still output tokens instead of production rules, and have a more complex set of modules in their decoder. Our method considerably outperforms this work.

Zero-shot semantic parsing: One of the main challenges in Spider is handling databases at test time that were not seen during training, in a zero-shot setting. Zero-shot semantic parsing has been studied before (Herzig and Berant, 2018; Lake and Baroni, 2018), with the best method using a complex two-step processes to decouple program structure from identifiers. Our grammar-based model, with separate handling for global rules and linked rules, naturally performs this decoupling without additional complexity.

7 Conclusion

We proposed a model that uses a dynamic schema-dependent SQL grammar to guide the decoding process and a deterministic entity linking module for the NLIDB task. Comparing to prior work, we show that decoding into a structured output with type constraints gives considerable improvements in performance, yielding a 4.5% absolute increase in denotation accuracy and 14.1% exact component matching over the best prior work on ATIS and Spider respectively. Our result suggests type information through link embedding or identifier anonymization and modeling context sensitivity is important for the task.


  • Aho et al. (1986) Alfred V Aho, Ravi Sethi, and Jeffrey D Ullman. 1986. Compilers, principles, techniques. Addison wesley, 7(8):9.
  • Alvarez-Melis and Jaakkola (2016) David Alvarez-Melis and Tommi S Jaakkola. 2016. Tree-structured decoding with doubly-recurrent neural networks.
  • Ana-Maria Popescu and Kautz (2003) Oren Etzioni Ana-Maria Popescu and Henry Kautz. 2003. Towards a theory of natural language interfaces to databases. In Proceedings of the 8th International Conference on Intelligent User Interfaces, pages 149–157.
  • Androutsopoulos et al. (1995) Ion Androutsopoulos, Graeme D Ritchie, and Peter Thanisch. 1995. Natural language interfaces to databases–an introduction. Natural language engineering, 1(1):29–81.
  • Dong and Lapata (2016) Li Dong and Mirella Lapata. 2016. Language to logical form with neural attention. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 33–43.
  • Finegan-Dollak et al. (2018) Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Xiang Lin, Karthik Ramanathan, Sesh Sadasivam, Rui Zhang, and Dragomir R. Radev. 2018. Improving text-to-sql evaluation methodology. In ACL.
  • Ford (2004) Bryan Ford. 2004. Parsing expression grammars: a recognition-based syntactic foundation. In ACM SIGPLAN Notices, volume 39, pages 111–122. ACM.
  • Gardner et al. (2018) Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson Liu, Matthew Peters, Michael Schmitz, and Luke Zettlemoyer. 2018. Allennlp: A deep semantic natural language processing platform. arXiv preprint arXiv:1803.07640.
  • Glorot and Bengio (2010) Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256.
  • Gu et al. (2016) Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O. K. Li. 2016. Incorporating copying mechanism in sequence-to-sequence learning. CoRR, abs/1603.06393.
  • Hemphill et al. (1990) Charles T Hemphill, John J Godfrey, and George R Doddington. 1990. The atis spoken language systems pilot corpus. In Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27, 1990.
  • Herzig and Berant (2018) Jonathan Herzig and Jonathan Berant. 2018. Decoupling structure and lexicon for zero-shot semantic parsing. In EMNLP.
  • Hochreiter and Schmidhuber (1997) Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
  • Iyer et al. (2017) Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, and Luke S. Zettlemoyer. 2017. Learning a neural semantic parser from user feedback. In ACL.
  • Iyer et al. (2018) Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke S. Zettlemoyer. 2018. Mapping language to code in programmatic context. In EMNLP.
  • Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. Proceedings of the International Conference on Learning Representations.
  • Krishnamurthy et al. (2017) Jayant Krishnamurthy, Pradeep Dasigi, and Matt Gardner. 2017. Neural semantic parsing with type constraints for semi-structured tables. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1516–1526.
  • Lake and Baroni (2018) Brenden M. Lake and Marco Baroni. 2018. Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In ICML.
  • Lee et al. (2014) Kenton Lee, Yoav Artzi, Jesse Dodge, and Luke S. Zettlemoyer. 2014. Context-dependent semantic parsing for time expressions. In ACL.
  • Li and Jagadish (2014) Fei Li and H. V. Jagadish. 2014. Constructing an interactive natural language interface for relational databases. Proceedings of the VLDB Endowment, 8(1):73–84.
  • Liang et al. (2017) Chen Liang, Jonathan Berant, Quoc Le, Kenneth D. Forbus, and Ni Lao. 2017. Neural symbolic machines: Learning semantic parsers on freebase with weak supervision. In ACL.
  • Liang et al. (2011) Percy S. Liang, Michael I. Jordan, and Dan Klein. 2011. Learning dependency-based compositional semantics. Computational Linguistics, 39:389–446.
  • Petrov et al. (2006) Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. 2006. Learning accurate, compact, and interpretable tree annotation. In ACL.
  • Rabinovich et al. (2017) Maxim Rabinovich, Mitchell Stern, and Dan Klein. 2017. Abstract syntax networks for code generation and semantic parsing. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1139–1149.
  • Suhr et al. (2018) Alane Suhr, Srinivasan Iyer, and Yoav Artzi. 2018. Learning to map context-dependent sentences to executable formal queries. In NAACL-HLT.
  • Xu et al. (2017) Xiaojun Xu, Chang Liu, and Dawn Song. 2017. Sqlnet: Generating structured queries from natural language without reinforcement learning. arXiv preprint arXiv:1711.04436.
  • Yin and Neubig (2017) Pengcheng Yin and Graham Neubig. 2017. A syntactic neural model for general-purpose code generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 440–450.
  • Yin and Neubig (2018) Pengcheng Yin and Graham Neubig. 2018. Tranx: A transition-based neural abstract syntax parser for semantic parsing and code generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 7–12.
  • Yu et al. (2018a) Tao Yu, Zifan Li, Zilin Zhang, Rui Zhang, and Dragomir Radev. 2018a. Typesql: Knowledge-based type-aware neural text-to-sql generation. arXiv preprint arXiv:1804.09769.
  • Yu et al. (2018b) Tao Yu, Michihiro Yasunaga, Kai Yang, Rui Zhang, Dongxu Wang, Zifan Li, and Dragomir Radev. 2018b. Syntaxsqlnet: Syntax tree networks for complex and cross-domain text-to-sql task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1653–1663.
  • Yu et al. (2018c) Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, et al. 2018c. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. arXiv preprint arXiv:1809.08887.
  • Zelle and Mooney (1996) John M. Zelle and Raymond J. Mooney. 1996. Learning to parse database queries using inductive logic programming. In Proceedings of the Thirteenth National Conference on Artificial Intelligence - Volume 2, pages 1050–1055.
  • Zettlemoyer and Collins (2005) Luke S Zettlemoyer and Michael Collins. 2005. Learning to map sentences to logical form: structured classification with probabilistic categorial grammars. In Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence, pages 658–666. AUAI Press.
  • Zhong et al. (2017) Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2sql: Generating structured queries from natural language using reinforcement learning. CoRR, abs/1709.00103.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description