Computational Red Teaming in a Sudoku Solving Context: Neural Network Based Skill Representation and Acquisition
Abstract
In this paper we provide an insight into the skill representation, where skill representation is seen as an essential part of the skill assessment stage in the Computational Red Teaming process. Skill representation is demonstrated in the context of Sudoku puzzle, for which the real human skills used in Sudoku solving, along with their acquisition, are represented computationally in a cognitively plausible manner, by using feedforward neural networks with backpropagation, and supervised learning. The neural network based skills are then coupled with a hardcoded constraint propagation computational Sudoku solver, in which the solving sequence is kept hardcoded, and the skills are represented through neural networks. The paper demonstrates that the modified solver can achieve different levels of proficiency, depending on the amount of skills acquired through the neural networks. Results are encouraging for developing more complex skill and skill acquisition models usable in general frameworks related to the skill assessment aspect of Computational Red Teaming.
Keywords:
neural network, domain propagation, skill acquisition, supervised learning1 Introduction
In Computational Red Teaming (CRT) a Red agent takes actions to challenge a Blue agent, with a variety of purposes. In the cognitive domain, one of these purposes, which generated an intense interest in the scientific community in recent years, is to force a human Blue agent to improve its skills. This process involves two major aspects. First, the Red must find the proper ways of action for challenging the Blue; this is the task probing. Second, in order to find those ways of action, the Red must first assess Blue’s skills for finding its weaknesses and hence, potential directions of improvement. This second aspect is the skill assessment aspect, in which the representation of Blue’s skills is essential.
In this paper we apply the CRT to Sudoku puzzle and we focus on the representation of the skills used for solving a Sudoku game. We investigate the Sudoku literature in order to establish what are the skills that humans apply to solve the puzzles, and then we create their computational representation, in a manner that is cognitively plausible. We use feedforward neural networks (NN) to represent the skills, and we model the skill acquisition process through supervised learning and back propagation. The NNbased skills are then embedded into a classic hardcoded constraint propagation Sudoku solver, endowing it with the ability to learn Sudoku skills through training. While the Sudoku solving sequence remains hardcoded, the computational solver uses at each predefined step the pattern recognition capability of the neural networks, and thus, its proficiency varies based on the skills embedded in its structure. In order to demonstrate this we use two skill setups: a first one in which the neural networks can only detect the existence of a favourable pattern on te Sudoku board, and a second one in which the pattern can be both detected and localised. Simulation results show how the realistic skillbased solver can achieve different levels of proficiency in solving Sudoku in the two setups, with a higher level of proficiency reached for the first skill setup.
The paper is organised as follows. The second section presents the existing computational approaches on solving Sudoku and draws a conclusion on the lack of skillbased computational solvers. The third section shows how we choose from the range of human skills used in Sukodu solving, in order to transfer them into the proposed skill representation and acquisition model. The fourth section describes the methodology used for modelling the skills and the NNbased skill acquisition process. The fifth section presents and discusses the results of the experiments. Last section concludes the study and summarises the main findings.
2 Background on computational Sudoku solving
The existing computational Sudoku solvers focus mostly on reducing the implementation and computational complexity, and on solving the puzzle as a search/optimisation problem, without Sudoku domainspecific knowledge or concerns about the cognitive plausibility.
From a computational perspective several Sudoku solvers have been reported in the literature. The simplest, but also the least effective is the backtracking solver, a brute force method that uses the full space of possible grids and performs a backtrackingbased depthfirst search through the resultant solution tree [2]. Another simple solver is the “pencil and paper” algorithm [4] which visits cells in the grid and generates on the fly a search tree.
In a strict mathematical view, the general formulation of Sudoku is considered a nondeterministic polynomial time (NP) problem. An open question still exists in the literature on whether Sudoku belongs or not to the subclass of NPcomplete problems, however more authors seem to be on the NPcompleteness side [16, 17, 15, 6, 7]. Yato [16] and Yato and Seta [17] first demonstrated that the generalised Sudoku problem can be solved in polynomial time. Later, another approach [6] converted through reduction a Sudoku problem into a “Boolean Satisfiability” problem, also known as SAT. The approach allowed not only the solving, but also the analysis of a Sudoku puzzle difficulty from the polynomial computation time perspective. A similar SATbased solver was also proposed in [15], where the author describes a straightforward translation of a Sudoku grid into a propositional formula. The translation, combined with a general purpose SAT solver was able to solve puzzles within milliseconds. In addition, the author suggests that the algorithm can be extended to enumerate all possible solutions for Sudoku grids that are beyond the unique solution grids posed for usual commercial puzzles.
A distinct class of computational solvers is based on stochastic techniques. A solver based on swarm robotics was proposed in [11]. The solver uses an artificial bee colony (ABC) for a guided exploration of the Sudoku grid search space. The algorithm mimics the behaviour of bees when foraging, behaviour which is further used for building partial (local) solutions in problem domain. The purpose of the algorithm is to minimise the number of duplicate digits found on each row and column. The authors compare the ABC algorithm with a Sudoku solver based on a classic genetic algorithm proposed by Mantere [9], and demonstrate that the ABC solver outperforms the GA solver significantly (i.e., on average 6243 processing cycles for ABC, versus 1238749 cycles for GA). In a different study Perez and Marwala [12] proposed and compared four stochastic optimisation solvers: a Cultural Genetic Algorithm (CGA), a Repulsive Particle Swarm Optimisation (RPSO) algorithm, a Quantum Simulated Annealing (QSA) algorithm, and a Hybrid method combining a Genetic Algorithm with Simulated Annealing (HGASA). The authors found that the CGA, QSA and HGASA were successful with runtimes of 28, 65 and 1.447 seconds respectively, while the RPSO failed to solve any puzzle. The authors concluded that the very low runtime of HGASA was due to combining the parallel searching of GA with the flexibility of SA. In the same time, they suggested that RPSO was not able to solve the puzzles because the search operations could not be naturally adapted to generating better solutions.
Another class of computational solvers is based on neural networks [18, 8]; however, these solvers are not emphasising on the cognitive plausibility of the neural networks, but rather on their mathematical mechanism. In [18] the authors propose a Sudoku solver based on the Q’tron energydriven neuralnetwork model. They map the Sudoku constraints in Qtron’s energy function, which is then minimised ensuring the local minimums are avoided through a noiseinjection mechanism. The authors show that the algorithm is totally unsuccessful in the absence of noise, while with the noise the success rate is and the runtime is within 1 second. Also they demonstrate that the algorithm can be used not only for solving, but also for generating puzzles with unique solution. In a different approach, Hopfield [8] considers that neural networks do not work well when applied to Sudoku, because they tend to make errors on the way. While [18] treats this problem by injecting noise in the Q’tron, Hopfield assumes that the search space during a Sudoku game can be mapped into an associative memory which can be used for recognising the inherent errors and reposition the NN representation of the Sudoku grid on the proper search path.
One particular class of computational Sudoku solvers, which is of major interest for our study, is the Constraint Propagation (CP) solvers. Several studies considered that Sudoku puzzle can be treated as a Constraint Satisfaction Problem [14, 10], and hence, can be solved using constraint programming techniques. Constraint Propagation solvers are purely computational methods, and the studies that proposed them followed the same purpose as the rest of the computational approaches, i.e. to produce proficient Sudoku solvers with minimal computational complexity and no domain knowledge. However, the constraint propagation processes described in both [14] and [10] are considered to be similar to the steps undertaken by human players when solving Sudoku. In his study [10] Norvig emphasises that the major task performed by humans when playing Sudoku is not to fill an empty cell, but to eliminate the multiple candidates for it, as a result of applying and propagating the Sudoku constraints. Yet, Norvig does not mention in which way the propagation of constraints resembles the human thinking. Instead, Simonis [14] does, and states that the various Sudokurelated skills used by the human players when trying to eliminate redundant candidates from cells are actually propagation schemes that participate to a constraint propagation process which eventually solves the constraint satisfaction problem. Simonis considers that “they [human players] apply complex propagation schemes with names like XWing and Swordfish to find solutions of a rather simple looking puzzle called Sudoku. Unfortunately, they are not aware that this is constraint programming”. An even more advanced step towards demonstrating this concept is taken in [2] where the authors implement the constraint propagation based on a set of Sudoku skills (e.g. naked candidates, hidden candidates, Nishioguess). The authors do not relate their algorithm to the constraint propagation formalism, and refer to it as “rulebased”, but they emphasise it “consists of testing a puzzle for certain rules that […] eliminate candidate numbers. This algorithm is similar to the one human solver uses”.
In this study we build on the concepts proposed in the last class of computational Sudoku solvers, and we consider the skillbased approach on constraint propagation problem as central for the skill representation aspect of CRT applied to Sudoku. Thus, in the following section we describe in detail the Sudoku constraints and some of the skills used by human players in solving the game.
3 The Sudoku game and skills.
Sudoku is a number puzzle which in its most known form consists of 81 cells contained in a square grid that is further divided into boxes of cells. The aim of the game is to fill all cells in the grid with single digits between 1 and 9, so that a given number appears no more than once in the unit it belongs to, where the unit can be a row, a column or a box. These are the Sudoku rules or the constraints. In general the Sudoku problem can be seen as a grid with subsequent boxes of cells. The constraints for a grid can be then expressed in general as follows:

Cell. A cell must be filled with exactly one digit with value between and

Row. All values in a row must be unique: , and with .

Column. All values in a row must be unique: , and with .

Box. All values in a box must be unique: , , with .
3.1 Playing a game
In this study we consider the version of Sudoku. A player applies the Sudoku constraints to empty cells and generates lists of candidates for the visited cells. This process is displayed in Figure 1, where Figure 1(a) shows the application of rules to cell , and Figure 1(b) shows the lists of candidates for all empty cells in the grid.
The purpose of the game is to apply Sudoku skills and propagate the Sudoku constraints in order to reduce these lists of candidates to unique candidates [14] for all empty cells in the grid, which equals to filling the grid and, thus, solving the puzzle.
3.1.1 Performing Sudoku skills
In order to reduce the lists of candidates, players use various skills which propagate the domain. In [1, 5, 3] the authors note that players choose the skills based on the perceived context at the current move. The skills considered in this study belong to two categories, the naked and the hidden candidates, which allow the solving of a significant number of Sudoku games. More complex skills [13] can be involved for solving very difficult games, however it is outside the scope of this study to investigate an exhaustive list of skills.
The set of naked candidate skills consists of finding and propagating naked singles and doubles (Figure 2). Recognising and propagating a naked single is the simplest skill, where after the application of Sudoku constraints a cell has only one possible candidate. The value of this unique candidate solves the empty cell, and is propagated by removing the candidate value from the candidate lists of all other cells situated in the units the cell belongs to. A naked single is illustrated in Figure 2(a) in pink colour at . For the naked doubles, the lists of candidates are checked for a pair of cells in a Sudoku unit containing only the same two candidates. These candidates can only go in these cells, thus the propagation is done by removing them from the candidate lists of all other unsolved cells in that unit. In Figure 2(b) the cells coloured in pink, in column 3 at and show a naked double containing the candidate values (2,3).
The set of hidden candidate skills consists of finding and propagating hidden singles and doubles (Figure 3). For the hidden single if a candidate value appears in only one cell in a Sudoku unit (row, column or box), the value becomes the unique candidate for that cell, the rest being removed. Thus, the candidate becomes a naked single and further propagates the domain as a naked single. Figure 3(a) shows value 3 as a hidden single in cell . For the hidden double, if a given pair of candidates appears in only two empty cells in a unit, then only these candidates must remain in these cells, the other candidates being removed. Thus, the hidden double becomes a naked double and further propagates the domain as a naked double. Figure 3(b) shows the hidden pair in cells and .
4 Methodology
In this study we consider that performing skills is subject to pattern recognition, where the player must recognise the pattern of a skill in the lists of candidates in a unit, in order to be able to apply that skill. We model the acquisition of skills through supervised training of feedforward neural networks with backpropagation mechanism, one network for each skill. We treat two possible situations in skill acquisition. First, we train the ability to recognise the existence of a skill pattern in a Sudoku unit (cell, column or box) and we call this case “skill detection”. Second, we train the ability to recognise not only the existence of a skill pattern, but also the cells in the unit which the skills is applicable to. We call this case “skill localisation”. In the two cases the resultant neural networks have similar number of neurons in the input and hidden layer, and similar training sets for learning the skills, but they have different number of neurons in the output layer and, consequently, different target sets .
Figure 4 shows the encoding of candidate list information into the input layer of neural networks. In a Sudoku unit, each of the nine cells can have a maximum of nine potential candidates, i.e. the digits from to . However, at a certain step in the game the current candidate lists usually contain less than nine digits; the lists can be depicted as in the third row of the table. We encode the decimal values of the candidates into binary values as presented in the third row. The total length of the binary encoded lists of candidates is 81, thus, we use neural networks with 81 neurons in the input layer.
4.1 Skill detection
For detecting the states we adopt the network structure presented in Figure 5(a), with one node in the output layer which shows if the pattern of a skill is present in a Sudoku unit. For each skill we use artificially generated training and target sets, as presented in Algorithm 1. For the skills treated in this study a training sample is a binary vector with 81 elements corresponding to the 81 nodes in the input layer.
4.1.1 Single candidates
The training dataset is a binary matrix consisting of 162 samples (). 81 samples correspond to all possible appearances of a naked or hidden single in a unit (i.e. there can be 81 naked single situations in a Sudoku column, row or box), for which the values of the target set . The other 81 samples do not contain the single candidate pattern, hence the values of the target set are .
4.1.2 Double candidates
The training dataset for double candidate skills (naked and hidden doubles) is a binary matrix consisting of 2592 samples (). 1296 samples correspond to all possible appearances of a skill in a unit (i.e. there can be 1296 naked double situations in a Sudoku column, row or box), for which the values of the target set . The other 1296 samples do not contain the skill pattern, hence the values of the target set are .
4.2 Skill localisation
For locating the patterns of skills we adopt the network structure presented in Figure 5(b), with nine nodes in the output layer. The nine nodes correspond to the nine cells in a Sudoku unit. Depending on which skill is subject to recognition, the cells in which the skill pattern exists will fire. The training dataset for this case is generated in a similar manner to the previous case. The generation is presented in Algorithm 2, where the training matrix is similar to that from the skill detection case (). The target set is a matrix with (), defined as in Equation 1.
(1) 
4.3 Network and training settings
We use the standard for activation function of nodes in the networks and the mean square root error function (MSE) for the subsequent gradient minimisation. The artificially generated training sets are split in ratios of 0.7, 0.15 and 0.15 for training, internal cross validation and generalisation testing, respectively.
4.4 Skill aggregation  the solver
The constraint propagation side of the Sudoku solving is hardcoded. However, the recognition of the patterns for each of the four skills considered in the study is implemented using the neural networks, and hence the ability to recognise either the existence of a skill pattern (detection) or its location (localisation) depends on the ability of the neural networks to produce the desired output. This implementation, with the hardcoded solving sequence, and the NN representation of the skills is error free from the Sudoku solving point of view, since it avoids situations when multiple states coexist in one board, i.e. a single candidate and double candidate simultaneously. Since the networks we propose are only meant to demonstrate the individual skills, they cannot treat cases where a combination of skills is present, or the player must choose from multiple skills. Since this aspect was outside of the scope of this study, we adopted a predetermined solving sequence implemented in the hardcoded constraint propagation module.
5 Results and discussion
Figure 6 shows the results of the training process in the skill detection recognition case. The training of each of the four skills is considered finished when the best validation performance is reached. Table 1 and Figure 7 present the game solving results for both trained and untrained skills situations, where the untrained skills are the skills acquired after one epoch in the neural networks. Results demonstrate how the proficiency of the skillbased solver improves with the acquisition of skills. In the table the difference between the number of detected skill patterns is shown for the two cases, while in the figure the result of game solving is shown in terms of the degrees of freedom. We demonstrate that the NNbased skill detection training is able to solve the proposed Sudoku game, provided that the rest of the solving mechanism is hardcoded in the solver.
Number of  Number of  Number of  Number of  Game result  
naked singles  naked doubles  hidden singles  hidden doubles  degree of freedom  
untrained  2  0  17  1  153 
trained  54  5  50  1  0 
For the skill localisation case, results of the training process are shown in Figure 8. Similar to the skill detection case, the training of each of the four skills is considered finished when the best validation performance is reached. Table 2 and Figure 9 present the results of game solving for trained and untrained skills, where the untrained skills are the skills acquired after one epoch in the neural networks. Results demonstrate again that the proficiency of the skillbased solver improves with the acquisition of skills. The proficiency in this case is lower, a result which is expected given that the solver must recognise not only the existence of a skill in a unit, but also the skill pattern. Results show an improvement in the number of recognised skills, which subsequently leads to less degrees of freedom, but the solver still does not reach the end of the proposed Sudoku game. However, since the proficiency is not the purpose of this study, we emphasise on the improvement resulted from skill acquisition using NN training.
Number of  Number of  Number of  Number of  Game result  
naked singles  naked doubles  hidden singles  hidden doubles  degree of freedom  
untrained  0  1  6  1  181 
trained  31  3  19  1  73 
6 Conclusions
In this paper we focused on the skill assessment aspect of the CRT process, for which the representation of skills is a central and essential issue. We investigated this using the Soduku puzzle by introducing a plausible representation of Sudoku skills, and by modelling the process of acquiring these skills. We used feedforward neural networks with backpropagation mechanism for training the skills and we tested the resultant skills in a cognitively plausible skillbased computational Sudoku solver.
The results of Sudoku game solving demonstrated the plausibility of using skills in computational Sudoku solvers, and also demonstrated the concept of skill acquisition in relation to the proficiency of this solver. We found that a skillbased computational Sudoku solver can achieve certain levels of proficiency by learning the Sudoku skills using neural networks. Results are encouraging for developing more complex skill and skill acquisition models usable in more general frameworks related to skill assessment stage of the Computational Red Teaming process.
Acknowledgement
This project is supported by the Australian Research Council Discovery Grant DP140102590, entitled “Challenging systems to discover vulnerabilities using computational red teaming”.
This is a preprint of an article published in Proceedings in Adaptation, Learning and Optimization, vol 5, Springer. The final authenticated version is available online at: https://doi.org/10.1007/9783319270005_26
References
 [1] Aslaksen, H.: The mathematics of sudoku (2014), http://www.math.nus.edu.sg/aslaksen/sudoku/
 [2] Berggren, P., Nilsson, D.: A study of Sudoku solving algorithms. Master’s thesis, Royal Institute of Technology, Stockholm (2012), http://www.csc.kth.se/utbildning/kth/kurser/DD143X/dkand12/Group6Alexander/final/Patrik_Berggren_David_Nilsson.report.pdf
 [3] Chadwick, S.B., Krieg, R.M., Granade, C.E.: Ease and toil: Analyzing sudoku. UMAP Journal p. 363 (2007)
 [4] Crook, J.F.: A pencilandpaper algorithm for solving sudoku puzzles. Notices of the AMS 56(4), 460–468 (2009)
 [5] Davis, T.: The math of sudoku (2008), http://math.stanford.edu/circle/notes08f/sudoku.pdf
 [6] Ercsey, R.M., Toroczkai, Z.: The chaos within sudoku. Sci. Rep. 2 (2012), http://dx.doi.org/10.1038/srep00725, 10.1038/srep00725
 [7] Goldberg, P.W.: Npcompletness of sudoku (09 2015), http://www.cs.ox.ac.uk/people/paul.goldberg/FCS/sudoku.html
 [8] Hopfield, J.J.: Searching for memories, sudoku, implicit check bits, and the iterative use of notalwayscorrect rapid neural computation. Neural Computation 20(5), 1119–1164 (2008), http://dx.doi.org/10.1162/neco.2007.0906345
 [9] Mantere, T., Koljonen, J.: Solving, rating and generating sudoku puzzles with ga. In: IEEE Congress on Evolutionary Computation. pp. 1382–1389. IEEE (2007)
 [10] Norvig, P.: Solving every sudoku puzzle, http://norvig.com/sudoku.html
 [11] Pacurib, J.A., Seno, G.M.M., Yusiong, J.P.T.: Solving sudoku puzzles using improved artificial bee colony algorithm. In: Fourth International Conference on Innovative Computing, Information and Control. pp. 885–888. IEEE (2009)
 [12] Perez, M., Marwala, T.: Stochastic optimization approaches for solving sudoku. eprint arXiv:0805.0697 (may 2008), http://arxiv.org/abs/0805.0697v1
 [13] Pitts, J.: Master Sudoku. Teach yourself, McGrawHill Companies, Inc. (2010)
 [14] Simonis, H.: Sudoku as a constraint problem. In: CP Workshop on modeling and reformulating Constraint Satisfaction Problems. vol. 12, pp. 13–27. Citeseer (2005)
 [15] Weber, T.: A satbased sudoku solver. In: The 12th International Conference on Logic for Programming Artificial Intelligence and Reasoning. pp. 11–15 (2005), http://www.cs.miami.edu/~geoff/Conferences/LPAR12/ShortPapers.pdf
 [16] Yato, T.: Complexity and Completeness of finding Another Solution and its application to puzzles. Master’s thesis, Graduate SChool of Science, University of Tokyo (2003), http://wwwimai.is.s.utokyo.ac.jp/~yato/data2/MasterThesis.pdf
 [17] Yato, T., Seta, T.: Complexity and completeness of finding another solution and its application to puzzles. IEICE transactions on fundamentals of electronics, communications and computer sciences 86(5), 1052–1060 (2003)
 [18] Yue, T.W., Lee, Z.C.: Sudoku Solver by Q’tron Neural Networks, Lecture Notes in Computer Science, vol. 4113, book section 115, pp. 943–952. Springer Berlin Heidelberg (2006), http://dx.doi.org/10.1007/11816157_115