Sudoku/Thorkil Naur

From HaskellWiki

Describing the Su Doku solver and related code[edit]

2006-Jun-03 / Thorkil Naur

Introduction: Representing the board[edit]

SuDoku.hs contains functions for solving SuDoku puzzles. The board under consideration is represented by a rectangular array of fields. (These "fields" are called "cells" in other SuDoku-related writings, sorry about that.) For each field, the set of possible elements that may be placed in the field is maintained.

An ordinary SuDoku puzzle uses a 9*9 array and the possible elements are the digits 1, 2, .. 9. Initially, the fields with given contents in the SuDoku puzzle hold a set containing just the single given element. And the blank fields of the puzzle hold the set of all possible elements.

Solving: Reduction and trials[edit]

Solving works by reducing the size of the sets that represent possible field contents in a sequence of steps. A solution has been found when each set contains exactly one element.

If reduction alone fails to produce a solution, a single trial step is taken: A field with (the fewest number of) non-single possiblities is selected and each possibility is tried one by one. For each possibility, renewed reduction is performed recursively, possibly leading to further trial steps and so on, until a solution is found. The solutions produced by each trial are combined, thus all solutions will be found.

This solution strategy is implemented by the sdkSolve function.

Constraint sets[edit]

Basic reduction is performed on one constraint set at a time. A constraint set is some selection of fields that must fulfill the basic SoDuko condition: That each field must contain precisely one element. And that all fields in the constraint set must contain different elements.

For ordinary SoDuko puzzles, the constraint sets are the nine rows, the nine columns and the nine 3*3 squares that subdivide the 9*9 board.

The SuDoku solver is parameterized over the constraint sets. It is not restricted to solve ordinary 9*9 SuDoku puzzles.

The basic board is assumed to be a rectangular array of fields. A constraint set is then represented by the list of indexes that address the fields of the constraint set. For example, the upper right 3*3 square constraint set of an ordinary 9*9 SuDoku puzzle is represented by the list (array indexes are 0-based):

  [(r,c) | r <- [0..2], c <- [6..8]]
   = [(0,6),(0,7),(0,8),(1,6),(1,7),(1,8),(2,6),(2,7),(2,8)]

The function sdkConstraintSetsTraditional computes constraint sets for a class of SuDoku puzzles called "traditional" which is a n^2*n^2 square that generalizes the ordinary SuDoku puzzle that has n=3. The function sdkConstraintSetsClover computes constraint sets for a class of puzzles called "clover" that consists of 5 n^2*n^2 squares in a clover-like pattern with a single n*n square of each "leaf" square overlapping the "center" square. The clover pattern is illustrated for n=3 as follows:

   . . . . . . . . .       . . . . . . . . .
   . . . . . . . . .       . . . . . . . . .
   . . . . . . . . .       . . . . . . . . .
   . . . . . . . . .       . . . . . . . . .
   . . . . . . . . .       . . . . . . . . .
   . . . . . . . . .       . . . . . . . . .
   . . . . . . . . . . . . . . . . . . . . .
   . . . . . . . . . . . . . . . . . . . . .
   . . . . . . . . . . . . . . . . . . . . .
               . . . . . . . . .
               . . . . . . . . .
               . . . . . . . . .
   . . . . . . . . . . . . . . . . . . . . .
   . . . . . . . . . . . . . . . . . . . . .
   . . . . . . . . . . . . . . . . . . . . .
   . . . . . . . . .       . . . . . . . . .
   . . . . . . . . .       . . . . . . . . .
   . . . . . . . . .       . . . . . . . . .
   . . . . . . . . .       . . . . . . . . .
   . . . . . . . . .       . . . . . . . . .
   . . . . . . . . .       . . . . . . . . .

Additional classes of SuDoku puzzles may be defined similarly.

Reducing the board[edit]

A single round of board reduction is performed by doing a basic reduction on each constraint set (the function sdkBoardReduceAll). Reduction closure, then, repeatedly performs single board reductions until the board changes no more (the function sdkBoardClosure).

Basic reduction strategies: By field and by element[edit]

Two different basic reduction strategies are implemented for a constraint set: Reduction by field and reduction by element.

Field reduction is based on the consideration of subsets of fields of the constraint set to be reduced. For some subset of the fields, the union of the elements that can be contained in any of the fields is computed. If the number of elements in this union is less or equal to the size of the subset, the subset is said to be reducing and the elements of the union cannot be used outside this subset of fields. Hence the elements of the union can be removed from all the other fields of the constraint set.

As an obvious special case of this, if a single field can only contain a single element, then that element can be excluded from all the other fields.

Field reduction uses this idea repeatedly: The non-empty subsets of the fields of the constraint set are considered in order of increasing size. If a subset is found to be reducing, the element union is excluded from the remaining fields of the constraint set and field reduction is repeated on this remaining set of fields. Field reduction is done by function sdkReduceByField.

(Technical note: sdkListLengthLE is used rather than comparison of list lengths to improve performance.)

For element reduction, subsets of elements are considered. For some subset of elements, the set of fields that can contain any of the elements is computed. If the number of such fields equals the number of elements in the subset, any additional elements recorded as candidates for these fields can be excluded.

For example, consider the subset {1,2} of elements and assume that we find that the only fields that may contain 1 or 2 have {1,2} and {1,2,3} as candidate element sets. Then 1 and 2 must occupy precisely these two fields and 3 can be excluded as a possibility for the second field.

As for field reduction, element reduction uses this idea repeatedly, considering at each step the shortest reducing element subset (function sdkReduceByElement).

Field reduction and element reduction produce identical results[edit]

It has come as a significant surprise to me that the two reduction strategies, although they certainly appear quite different to me, nevertheless produce identical results. At least in the practical circumstances of solving real SuDoku puzzles. And since field reduction is much faster than element reduction, the present version of the code uses field reduction only to find the final solutions: The call of sdkSolve in sdkProcessSuDoku uses by=sdkReduceByField. The element reduction code has been retained, however, for possible further experiments.

Command line interface to sdkSolve[edit]

The Main module contained in SdkMSol2.hs provides a command line interface to the SuDoku solver. Here is an example of a call of the solver that uses runhugs:

runhugs SdkMSol2 \
  tn1 \
  Traditional 3 \
  -#123456789 \
  1-53---9- \
  ---6----- \
  ------271 \
  82------- \
  ---487--- \
  ------53- \
  23------- \
  --7-59--- \
  --6---8-4

This is a command for a Linux system. A similar command can be used under Windows. (I usually keep these long commmands in individual command files, i.e. Shell scripts or BAT files.) And, of course, SdkMSol2 can be compiled by GHC and run in a similar manner.

The first parameter "tn1" is simply a name for the puzzle used in the output. In this case, the first (and so far only) SuDoku puzzle invented by Yours Truly. The next parameter "Traditional" selects the class of puzzle desired. Following parameters may be used to qualify specific puzzles within the class, in this case the "3" which selects the ordinary 9*9 SuDoku puzzle.

The remaining parameters lay out the initial contents of the rectangular board. Parameters of the form

  <character>#<list of characters>

("#" is used rather than "=" which gives problems in BAT files) are definitions where the first character is defined to represent the set of characters listed. Thus, the parameter "-#123456789" defines "-" to represent the set "123456789". Parameters not of this form represent rows of the, generally rectangular, board: In such a row, characters that have been "defined" represent their set, other characters represent the set containing that single character.

(The command line interface thus restricts the elements used by the solver to be individual characters. However, the solver itself is not restricted to handle single-character elements.)

Notice that the puzzle class with parameters represents the constraint sets used independently of the board, whose shape is basically defined by the number of rows specified and their maximum length. Conflicts are not verified explicitly, but generally result in (sometimes obscure) runtime errors.

Selected lines from the output of running the above command:

SdkMSol2: 2006-Jun-04 13.37
SdkMSol2(tn1)
  Height 9 Width 9
  1-53---9-
  ---6-----
  ------271
  82-------
  ---487---
  ------53-
  23-------
  --7-59---
  --6---8-4

tn1: Problem:
          1 123456789         5         3 123456789 123456789 123456789         9 123456789
  123456789 123456789 123456789         6 123456789 123456789 123456789 123456789 123456789
  123456789 123456789 123456789 123456789 123456789 123456789         2         7         1
          8         2 123456789 123456789 123456789 123456789 123456789 123456789 123456789
  123456789 123456789 123456789         4         8         7 123456789 123456789 123456789
  123456789 123456789 123456789 123456789 123456789 123456789         5         3 123456789
          2         3 123456789 123456789 123456789 123456789 123456789 123456789 123456789
  123456789 123456789         7 123456789         5         9 123456789 123456789 123456789
  123456789 123456789         6 123456789 123456789 123456789         8 123456789         4

tn1: Closure by field:
...
      1    78     5     3    27    28     4     9     6
     79   479     2     6  1479    14     3     8     5
    369  4689  3489   589    49   458     2     7     1
      8     2   139   159  1369  1356  1679     4    79
   3569  1569   139     4     8     7   169    16     2
    679 14679   149   129  1269   126     5     3     8
      2     3    18    18    46    46    79     5    79
      4    18     7   128     5     9    16   126     3
     59    59     6     7   123   123     8    12     4


tn1: Solutions:
  1 7 5 3 2 8 4 9 6
  9 4 2 6 7 1 3 8 5
  3 6 8 5 9 4 2 7 1
  8 2 9 1 3 5 6 4 7
  6 5 3 4 8 7 9 1 2
  7 1 4 9 6 2 5 3 8
  2 3 1 8 4 6 7 5 9
  4 8 7 2 5 9 1 6 3
  5 9 6 7 1 3 8 2 4


tn1: Total of 1 solutions
tn1: CPU Used: 53.4

Thus, the problem is stated (in two different ways), the sequence of intermediate boards resulting from performing field reduction on the initial board is printed and, finally, all the solutions are printed.

(And since field reduction is also carried out, starting from the initial board, as part of finding the solution, field reduction of the initial board is actually carred out twice. Something that may be changed by editing the sdkProcessSuDoku function.)

Here is an example of a command for solving a 16*16 SuDoku puzzle:

runhugs SdkMSol2 \
  MasterSUDOKU_106_20051208 \
  Traditional 4 \
  -#0123456789ABCDEF \
  --6--37--B-C--E- \
  -C4A08D-5-7-2--- \
  -9--F----0--C1-- \
  -1E8--6-2D-4--5- \
  F-CBE---12-39-7- \
  -A8-B-----F7---1 \
  7-D4-9---C-0---8 \
  --------D-9--B-- \
  --2--0-9-------- \
  E---8-A---4-12-B \
  4---67-----F-90- \
  -0-FD-C1---E38-5 \
  -B--4-97-E--80A- \
  --F0--2----9--3- \
  ---E-F-3-A0B6C1- \
  -4--C-0--81--F--

The final example uses the clover class:

runhugs SdkMSol2 \
  JuleSuDoku_Fem_I_En_20060102 \
  Clover 3 \
  -#123456789 \
  ---6327-1...--7---23- \
  --24--5-6...4--1----- \
  --85---9-...1-5-8--7- \
  94----127...-6----8-5 \
  1-5------...-3---714- \
  87--9----...-41-62-97 \
  5873-------3----3--19 \
  --4--8----4----52--6- \
  2---65-----9---7--58- \
  ......5-----12-...... \
  ......-8---67-9...... \
  ......--1372---...... \
  64---9----14----4---- \
  --263-------------74- \
  -93-4----93----5-3--2 \
  ----68-7-...2-5--8-9- \
  ---------...84--9-57- \
  5--3--84-...-----5-26 \
  --54-36--...7-------- \
  2--9-5--7...-5438-61- \
  -3782----...--37694--

Notice the dots (".") used. They do not enter any constraint sets, but are needed to fill out the board.

Performance[edit]

A single performance improvement has been implemented: It is the matter mentioned above that relates to sdkListLengthLE. I have no idea how the performance of this solver compares to other solvers. But it is clear that considerable improvement, I would guess certainly one, perhaps even two, orders of magnitude, would result from using bit vectors rather than Haskell Strings to represent sets.

Experimenting with reduction strategies[edit]

The main module in T44.hs uses miscellaneous functions in SuDoku.hs to exhaustively check whether field reduction is identical to element reduction under reasonable assumptions.

Test[edit]

The Main module in T40.hs contains test code.

Final words[edit]

Best wishes Thorkil Naur