Solving puzzles described in English by automated translation to answer set programming and learning how to do that translation
Abstract
We present a system capable of automatically solving combinatorial logic puzzles given in (simplified) English. It involves translating the English descriptions of the puzzles into answer set programming(ASP) and using ASP solvers to provide solutions of the puzzles. To translate the descriptions, we use a calculus based approach using Probabilistic Combinatorial Categorial Grammars (PCCG) where the meanings of words are associated with parameters to be able to distinguish between multiple meanings of the same word. Meaning of many words and the parameters are learned. The puzzles are represented in ASP using an ontology which is applicable to a large set of logic puzzles.
1 Introduction and Motivation
Consider building a system that can take as input an English description of combinatorial logic puzzles
In this paper we describe our development of such a system with some added assumptions. We present evaluation of our system in terms of how well it learns to understand clues (given in simplified
With respect to the first key step, we use a methodology [\citeauthoryearBaral et al.2011] that assigns ASPCalculus
With respect to the second key step, there are many ASP encodings, such as in [\citeauthoryearBaral2003], of combinatorial logic puzzles. However, most methods given in the literature, assume that a human is reading the English description of the puzzle and is coming up with the ASP code or code in some high level language [\citeauthoryearFinkel, Marek, and Truszczynski2002] that gets translated to ASP. In our case the translation of English description of the puzzles to ASP is to be done by an automated system and moreover this systems learns aspects of the translation by going over a training set. This means we need an ontology of how the puzzles are to be represented in ASP that is applicable to most (if not all) combinatorial logic puzzles.
The rest of the paper is organized as follows: We start by discussing the assumptions we made for our system. We then provide an overview of the ontology we used to represent the puzzles. We then give an overview of the natural language translation algorithm followed by a simple illustration on a small set of clues. Finally, we provide an evaluation of our approach with respect to translating clues as well as translating whole puzzles. We then conclude.
2 Assumptions and Background Knowledge
With our longer term goal to be able to solve combinatorial logic puzzles specified in English, as mentioned earlier, we made some simplifying assumptions for this current work. Here we assumed that the domains of puzzles are given (and one does not have to extract it from the puzzle description) and focused on accurately translating the clues. Even then English throws up many challenges and we did a human preprocessing
3 Puzzle representation and Ontology
For our experiments, we focus on logic puzzles from [\citeauthoryearpuz2007, \citeauthoryearpuz2004, \citeauthoryearpuz2005]. These logic puzzles have a set of basic domain data and a set of clues. To solve them, we adopt an approach where all the possible solutions are generated, and then constraints are added to reduce the number of solutions. In most cases there is a unique solution. A sample puzzle is given below, whose solution involves finding the correct associations between persons, their ranks, their animals and their lucky elements.
Puzzle Domain data: 1,2,3,4 and 5 are ranks earl, ivana, lucy, philip and tony are names earth, fire, metal, water and wood are elements cow, dragon, horse, ox and rooster are animals Puzzle clues: 1) Tony was the third person to have his fortune told. 2) The person with the Lucky Element Wood had their fortune told fifth. 3) Earl’s lucky element is Fire. 4) Earl arrived immediately before the person with the Rooster. 5) The person with the Dragon had their fortune told fourth. 6) The person with the Ox had their fortune told before the one who’s Lucky Element is Metal. 7) Ivana’s Lucky Animal is the Horse. 8) The person with the Lucky Element Water has the Cow. 9) The person with Lucky Element Water did not have their fortune told first. 10) The person with Lucky Element Earth had their fortune told exactly two days after Philip.
The above puzzle can be encoded as follows.
% DOMAIN DATA index(1..4). eindex(1..5). etype(1, name). element(1,earl). element(1,ivana). element(1,lucy). element(1,philip). element(1,tony). etype(2, element). element(2,earth). element(2,fire). element(2,metal). element(2,water). element(2,wood). etype(3, animal). element(3,cow). element(3,dragon). element(3,horse). element(3,ox). element(3,rooster). etype(4, rank). element(4,1). element(4,2). element(4,3). element(4,4). element(4,5). % CLUES and their translation %Tony was the third person to have %his fortune told. : tuple(I, tony), tuple(J, 3), I!=J. %The person with the Lucky Element %Wood had their fortune told fifth. : tuple(I, wood), tuple(J, 5), I!=J. %Earl’s lucky element is Fire. : tuple(I, earl), tuple(J, fire), I!=J. %Earl arrived immediately before %the person with the Rooster. : tuple(I, earl), tuple(J, rooster), tuple(I, X), tuple(J, Y), etype(A, rank), element(A, X), element(A, Y), X != Y1. %The person with the Dragon had %their fortune told fourth. : tuple(I, dragon), tuple(J, 4), I!=J. %The person with the Ox had their % fortune told before the %one who’s Lucky Element is Metal. : tuple(I, ox), tuple(J, metal), tuple(I, X), tuple(J, Y), etype(A, rank), element(A, X), element(A, Y), X > Y. %Ivana’s Lucky Animal is the Horse. : tuple(I, ivana), tuple(J, horse), I!=J. %The person with the Lucky Element %Water has the Cow. : tuple(I, water), tuple(J, cow), I!=J. %The person with Lucky Element Water %did not have their fortune told first. : tuple(I, water), tuple(I, 1). %The person with Lucky Element Earth %had their fortune %told exactly two days after Philip. : tuple(I, earth), tuple(J, philip), tuple(I, X), tuple(J, Y), etype(A, rank), element(A, X), element(A, Y), X != Y+2.
3.1 The puzzle domain data
Each puzzle comes with a set of basic domain data which forms tuples. An example of this data is given above. Note that this is not the format in which they are provided in the actual puzzles. It is assumed that the associations are exclusive, e.g. “earl” can own either a “dragon” or a “horse”, but not both. We assume this data is provided as input. There are several reasons for this assumption. The major reason is that not all the data is given in the actual natural language text describing the puzzle. In addition, the text does not associate actual elements, such as “earth” with element types, such as “element”. If the text contains the number “6”, we might assume it is a rank, which, in fact, it is not. These domain data is encoded using the following format, where stores the element type , while is the predicate storing all the elements of the type . An example of an instance of this encoding is given below.
% size of a tuple index(1..n). % number of tuples eindex(1..m). % type and lists of elements of that type, % one element from % each index forms a tuple etype(1, type1). element(1, el11). element(1, el12). ... element(1, el1n). ... etype(m, typem). element(m, em11). element(1, elm2). ... element(1, elmn).
We now discuss this encoding in more detail. We want to encode all the elements of a particular type, The type is needed in order to do direct comparisons between the elements of some type. For example, when we want to specify that “Earl arrived immediately before the person with the Rooster.”, as encoded in the sample puzzle, we want to encode something like , which compares the ranks of elements and . The reason all the element types and elements have fixed numerical indices is to keep the encoding similar across the board and to not have to define additional grounding for the variables. For example, if we encoded elements as , then if we wanted to use the variable in the encodings of the clue, it would have to have defined domain which includes all the element types. These differ from puzzle to puzzle, and as such would have to be specifically added for each puzzle. By using the numerical indices across all puzzles, these are common across the board and we just need to specify that is an index. In addition, to avoid permutation within the tuples, the following facts are generated, where is the predicate storing the elements within a tuple :
tuple(1,e11). ... tuple(1,e1n).
which for the particular puzzle yields
tuple(1, 1). tuple(2, 2). tuple(3, 3). tuple(4, 4).tuple(5,5).
3.2 Generic modules and background knowledge
Given the puzzle domain data, we combine their encodings with additional modules responsible for generation and generic knowledge. In this work, we assume there are two type of generic modules available. The first one is responsible for generating all the possible solutions to the puzzle. We assume these are then pruned by the actual clues, which impose constraints on these. The following rules are responsible for generation of all the possible tuples. Recall that we assume that all the elements are exclusive.
1{tuple(I,X):element(A,X)}1. : tuple(I,X), tuple(J,X), element(K,X), I != J.
In addition, a module with rules defining generic/background knowledge is used so as to provide higher level knowledge which the clues define. For example, a clue might discuss maximum, minimum, or genders such as woman. To be able to match these with the puzzle data, a set of generic rules defining these concepts is used, rather than adding them into the actual puzzle data. Thus rules defining concepts and knowledge such as maximum, minimum, within range, sister is a woman and others are added. For example, the concept “maximum” is encoded as:
notmax(A, X) : element(A, X), element(A, Y), X != Y, Y > X. maximum(A, X) : not notmax(A,X), element(A,X).
3.3 Extracting relevant facts from the puzzle clues
A sample of clues with their corresponding representations is given in the sample puzzle above. Let us take a closer look at the clue “Tony was the third person to have his fortune told.”, encoded as . This encoding specifies that if “Tony” is assigned to tuple , while the rank “3” is assigned to a different tuple , we obtain false. Thus this ASP rule limits all the models of it’s program to have “Tony” assigned to the same tuple as “3”. One of the questions one might ask is where are the semantic data for “person” or “fortune told”. They are missing from the translation since with respect to the actual goal of solving the puzzle, they do not contribute anything meaningful. The fact that “Tony” is a “person” is inconsequential with respect to the solutions of the puzzle. With this encoding, we attempt to encode only the relevant information with regards to the solutions of the puzzle. This is to keep the structure of the encodings as simple and as general as possible. In addition, if the rule would be encoded as , the fact would have to be added to the program in order for the constraint to give it’s desired meaning. However, this does not seem reasonable as there are no reasons to add it (outside for the clue to actually work), since “person” is not present in the actual data of the puzzle.
4 Translating Natural language to ASP
To translate the english descriptions into ASP, we adopt our approach in [\citeauthoryearBaral et al.2011]. This approach uses lambda computations, generalization on demand and trivial semantic solutions together with learning. However for this paper, we had to adapt the approach to the ASP language and develop an ASPCalculus. An example of a clue translation using combinatorial categorial grammar [\citeauthoryearSteedman2000] and ASPcalculus is given in table 1.
The system uses the two inverse operators, and as given in [\citeauthoryearBaral et al.2011] and [\citeauthoryearGonzalez2010]. Given calculus formulas and , these allow us to compute a calculus formula such that and . We now present one of the two Inverse operators, as given in [\citeauthoryearBaral et al.2011]. For more details, as well as the other operator, please see [\citeauthoryearGonzalez2010].We now introduce the different symbols used in the algorithm and their meaning :

Let , represent typed calculus formulas, ,,…, represent typed terms, to , and represent variables and ,…, represent typed atomic terms.

Let represent a typed atomic formula. Atomic formulas may have a different arity than the one specified and still satisfy the conditions of the algorithm if they contain the necessary typed atomic terms.

Typed terms that are sub terms of a typed term J are denoted as .

If the formulas we are processing within the algorithm do not satisfy any of the conditions then the algorithm returns .
Definition 1 (operator :)
Consider two lists of typed elements A and B, and respectively and a formula . The result of the operation is obtained by replacing by , for each appearance of A in H.
Next, we present the definition of an inverse operators
Definition 2
The function is defined as:
Given and :

If is , set

If is a sub term of and G is

=


G is not , is a sub term of and G is with 1 p,q,s m.

= .

Lets assume that in the example given by table 1 the semantics of the word “immediately” is not known. We can use the Inverse operators to obtain it as follows. Using the semantic representation of the whole sentence as given by table 1, and the word “Earl”,, we can use the respective operators to obtain the semantic of “arrived immediately before the man with the Rooster” as
Repeating this process recursively we obtain as the representation of “arrived immediately” and as the desired semantic for “immediately”.
The input to the overall learning algorithm is a set of pairs , where is a sentence and its corresponding logical form. The output of the algorithm is a PCCG defined by the lexicon and a parameter vector . As given by [\citeauthoryearBaral et al.2011], the parameter vector is updated at each iteration of the algorithm. It stores a real number for each item in the dictionary. The overall learning algorithm is given as follows:

Input: A set of training sentences with their corresponding desired representations where are sentences and are desired expressions. Weights are given an initial value of .
An initial feature vector . An initial lexicon .

Output: An updated lexicon . An updated feature vector .

Algorithm:

Set

For t = 1 . . . T

Step 1: (Lexical generation)

For i = 1…n.

For j = 1…n.

Parse sentence to obtain

Traverse

apply , and to find new calculus expressions of words and phrases .


Set


Step 2: (Parameter Estimation)

Set
^{6}


return
To translate the clues, a trained model was used to translate these from natural language into ASP. This model includes a dictionary with calculus formulas corresponding to the semantic representations of words. These have their corresponding weights.
Earl  arrived  immediately  before  the  man  with  the  Rooster. 

earl  arrived  immediately 

before 

the  man  with  the  Rooster. 

Miss Hanson  is  withdrawing  more  than the customer whose number is 3989. 

Miss Hanson  is  withdrawing 

more 

than the customer whose number is 3989. 
Tables 1 and 2 give two sample translations of a sentence into answer set programming. In the second example, the parse for the “than the customer whose number is 3989.” part is not shown to save space. Also note that in general, names and several nouns were preprocessed and treated as a single noun due to parsing issues. The most noticeable fact is the abundance of expressions such as , which basically directs to ignore the word. The main reason for this is the nature of the translation we are performing. In terms of puzzle clues, many of the words do not really contribute anything significant to the actual clue. The important parts are the actual objects, “Earl” and “Rooster” and their comparison, “arrived immediately before”. In a sense, the part “the man with the” does not provide much semantic contribution with regards to the actual puzzle solution. One of the reasons is the way the actual clue is encoded in ASP. A more complex encoding would mean that more words have significant semantic contributions, however it would also mean that much more background knowledge would be required to solve the puzzles.
5 Illustration
We will now illustrate the learning algorithm on a subset of puzzle clues. We will use the following puzzle sentences, as given in table 3
Donna dale does not have green fleece.  

Hy Syles has a brown fleece.  
Flo Wingbrook’s fleece is not red.  
Barbie Wyre is dining on hardboiled eggs.  
Dr. Miros altered the earrings.  
A garnet was set in Dr. Lukta’s piece.  
Michelle is not the one liked by 22  
Miss Hanson is withdrawing more than the customer whose number is 3989.  
Albert is the most popular.  
Pete talked about government.  
Jack has a shaved mustache  
Jack did not get a haircut at 1  
The first open house was not listed for 100000.  
The candidate surnamed Waring is more popular than the PanGlobal  
Rosalyn is not the least popular. 
Lets assume the initial dictionary contains the following semantic entries for words, as given in table 4. Please note that many of the nouns and noun phrases were preprocessed.
verb v  , , 

noun n  , 
noun n with general knowledge  
Example:sister, maximum, female,… 
The algorithm will than start processing sentences one by one and attempt to learn new semantic information. The algorithm will start with the first sentence, “Donna dale does not have green fleece.” Using inverse , the algorithm will find the semantics of “not” as . In a similar manner it will continue through the sentences learning new semantics of words. An interesting set of learned semantics as well as weights for words with multiple semantics are given in table 5.
word  semantics  weight 
not  0.28  
not  0.3  
has  0.22  
has  0.05  
has  0.05  
has  0.05  
popular  0.17  
popular  0.03  
a  0.1  
not  0.1  
on  0.1  
the  0.1  
in  0.1  
by  0.1  
most  0.1  
about  0.1  
shaved  0.1  
at  0.1  
first  0.1  
for  .  0.1 
least  0.1  
more  
0.1 
6 Evaluation
We assume each puzzle is a pair where corresponds to puzzle domain data, and correspond to the clues of the puzzle given in simplified English. As discussed before, we assume the domain data is given for each of the puzzles. A set of training puzzles, is used to train the natural language model which can be used translate natural language sentences into their ASP representations. This model is then used to translate clues for new puzzles. The initial dictionary contained nouns with most verbs. A set of testing puzzles, , is validated by transforming the data into the proper format, adding generic modules and translating the clues of using the trained model.
To evaluate our approach, we considered different logic puzzles from various magazines, such as [\citeauthoryearpuz2007, \citeauthoryearpuz2004, \citeauthoryearpuz2005]. We focused on evaluating the accuracy with which the actual puzzle clues were translated. In addition, we also verified the number of puzzles we solved. Note that in order to completely solve a puzzle, all the clues have to be translated accurately, as a missing clue means there will be several possible answer sets, which in turn will give an exact solution to the puzzle. Thus if a system would correctly translate of the puzzle clues, and assuming the puzzles have on an average 10 clues, then one would expect the overall accuracy of the system to be , or around .
To evaluate the clue translation, clues were selected. Standard 10 fold cross validation was used. measures the number of correctly translated clues, save for permutations in the body of the rules, or head of disjunctive rules. measures the number of correct exact translations.
To evaluate the puzzles, we used the following approach. A number of puzzles were selected and all their clues formed the training data for the natural language module. The training data was used to learn the meaning of words and the associated parameters and these were then used to translate the English clues to ASP. These were then combined with the corresponding puzzle domain data, and the generic/background ASP module. The resulting program was solved using , an extension of [\citeauthoryearGebser et al.2007]. measured the number of correctly solved puzzles. A puzzle was considered correctly solved if it provided a single correct solution. If a rule provided by the clue translation from English into ASP was not syntactically correct, it was discarded. We did several experiments. Using the puzzles, we did a 10fold cross validation to measure the accuracy. In addition, we did additional experiments with 10, 15 and 20 puzzle manually chosen as training data. The manual choice was done with the intention to pick the training set that will entail the best training. In all cases, the parser [\citeauthoryearClark and Curran2007] was used to obtain the syntactic parse tree.
6.1 Results and Analysis
The results are given in tables 7 and 6. The “10fold” corresponds to experiments with 10fold validation, “10s”, “15s” and “20s” to experiments where 10, 15 and 20 puzzles were manually chosen as training data respectively.
Precision  Recall  Fmeasure 
87.64  86.12  86.87 
Accuracy  

10Fold  28/50 
10s  22/40 
15s  24/35 
20s  25/30 
The results for clue translation to ASP is comparable to translating natural language sentences to Geoquery and Robocup domains used by us in [\citeauthoryearBaral et al.2011], and used in similar works such as [\citeauthoryearZettlemoyer and Collins2007] and [\citeauthoryearGe and Mooney2009]. Our results are close to the values reported there, which range from to percent for the database domain and to percent for the Robocup domain.
As discussed before, a accuracy is expected to lead to around rate for the actual puzzles. Our result of is significantly higher. It is interesting to note that as the number of puzzles used for training increases, so does the accuracy. However, there seems to be a ceiling of around .
In general, the reason for not being able to solve a puzzle lies in the inability to correctly translate the clue. Incorrectly translated clues which are not syntactically correct are discarded, while for some clues the system is not capable to produce any ASP representation at all. There are several major reasons why the system fails to translate a clue. First, even with large amount of training data, some puzzles simply have a relatively unique clue. For example, for the clue, “The person with Lucky Element Earth had their fortune told exactly two days after Philip.” the “exactly two days after” part is very rare and a similar clue, which discusses the distance of elements on a time line is only present in two different puzzles. There were only 2 clues that contain “aired within n days of each other”, both in a single puzzle. If this puzzle is part of the training set, since we are not validating against it, it has no impact on the results. If it’s one of the tested puzzles, this clue will essentially never be translated properly and as such the puzzle will never be correctly solved. In general, many of the clues required to solve the puzzles are very specific, and even with the addition of generic knowledge modules, the system is simply not capable to figure them out. A solution to this problem might be to use more background knowledge and a larger training sample, or a specific training sample which focuses on various different types of clues. In addition, when looking at tables 1 and 5, many of the words are assigned very simple semantics that essentially do not contribute any meaning to the actual translation of the clue. Compared to database query language and robocup domains, there are several times as many simple representations. This leads to several problems. One of the problems is that the remaining semantics might be over fit to the particular training sentences. For example, for “aired within n days of each other” the only words with non trivial semantics might be “within” and some number “n”, which in turn might not be generic for other sentences. The generalization approach adopted from [\citeauthoryearBaral et al.2011] is unable to overcome this problem. The second problem is that a lot of words have these trivial semantics attached, even though they also have several other non trivial representations. This causes problem with learning, and the trivial semantics may be chosen over the nontrivial one. Finally, some of the parses do not allow the proper use of inverse operators, or their use leads to very complex expressions with several applications of . In table 1, this can be seen by looking the representation of the word “immediately”. While this particular case does not cause serious issues, it illustrates that when present several times in a sentence, the resulting expression can get very complex leading to third or fourth order ASPcalculus formulas.
7 Conclusion and Future work
In this work we presented a learning approach to solve combinatorial logic puzzles in English. Our system uses an initial dictionary and general knowledge modules to obtain an ASP program whose unique answer set corresponded to the solution of the puzzle. Using a set of puzzles and their clues to train a model which can translate English sentences into logical form, we were able to solve many additional puzzles by automatically translating their clues, given in simplified English, into ASP. Our system used results and components from various AI subdisciplines including natural language processing, knowledge representation and reasoning, machine learning and ontologies as well as the functional programming concept of calculus. There are many ways to extend our work. The simplified English limitation might be lifted by better natural language processing tools and additional sentence analysis. We could also apply our approach to different types of puzzles. A modified encodings might yield a smaller variance in the results. Finally we would like to submit that solving puzzles given in a natural language could be considered as a challenge problem for human level intelligence as it encompasses various facets of intelligence that we listed earlier. In particular, one has to use a reasoning system and can not substitute it with surface level analysis often used in information retrieval based methods.
Footnotes
 An example is the wellknown Zebra puzzle. http://en.wikipedia.org/wiki/Zebra_Puzzle
 Our simplified English is different from “controlled” English in that it does not have a prespecified grammar. We only do some preprocessing to eliminate anaphoras and some other aspects.
 ASPCalculus is inspired by Calculus. The classical logic formulas in Calculus are replaced by ASP rules in ASPCalculus.
 The people doing the preprocessing were not told of any specific subset of English or any “Controlled” English to use. They were only asked to simplify the sentences so that each sentence would translate to a single clue.
 This is the operator that was used in this implementation. In a companion work we develop an enhancement of this operator which is proven sound and complete.
 For details on computation, please see [\citeauthoryearZettlemoyer and Collins2005]
References
 Baral, C.; Gonzalez, M.; Dzifcak, J.; and Zhou, J. 2011. Using inverse and generalization to translate english to formal languages. In Proceedings of the International Conference on Computational Semantics, Oxford, England, January 2011.
 Baral, C. 2003. Knowledge Representation, Reasoning, and Declarative Problem Solving. Cambridge University Press.
 Clark, S., and Curran, J. R. 2007. Widecoverage efficient statistical parsing with ccg and loglinear models. Computational Linguistics 33.
 Finkel, R. A.; Marek, V. W.; and Truszczynski, M. 2002. Constraint lingo: A program for solving logic puzzles and other tabular constraint problems. In JELIA, 513–516.
 Ge, R., and Mooney, R. J. 2005. A statistical semantic parser that integrates syntax and semantics. In Proceedings of CoNLL., 9–16.
 Ge, R., and Mooney, R. J. 2009. Learning a compositional semantic parser using an existing syntactic parser. In Proceedings of ACLIJCNLP., 611–619.
 Gebser, M.; Kaufmann, B.; Neumann, A.; and Schaub, T. 2007. Clasp : A conflictdriven answer set solver. In LPNMR, 260–265.
 Gonzalez, M. A. 2010. An inverse lambda calculus algorithm for natural language processing. Master’s thesis, Arizona State University.
 2004. Logic problems. Penny Press.
 2005. Logic puzzles. Dell.
 2007. England’s best logic problems. Penny Press.
 Steedman, M. 2000. The syntactic process. MIT Press.
 Zettlemoyer, L., and Collins, M. 2005. Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In AAAI, 658–666.
 Zettlemoyer, L., and Collins, M. 2007. Online learning of relaxed ccg grammars for parsing to logical form. In Proceedings of EMNLPCoNLL, 678–687.