Plsql Sudoku

Embed Size (px)

DESCRIPTION

psudoku project

Citation preview

Oracle PL/SQL Sudoku Solver

Oracle PL/SQL Sudoku Solver

There comes a time when the monotony of work needs a little refreshing diversion. Some of us like to relax (or frustrate) the time a way with a Sudoku puzzle which is fast becoming pandemic craze.

I enjoy solving these puzzles myself, but with a lust for a challenge, I could not resist the opportunity to develop a Sudoku puzzle solver written entirely in Oracle SQL and PL/SQL.

By developing this solver I have also demonstrated how it is possible to develop a relatively simple solution to solve a reasonably difficult problem.

The solver will solve most of The Times Fiendish puzzles. The solver application does not support guessing, so you may find puzzles that it will not solve, though there is an argument that a Sudoku puzzle should be solvable without guessing.

I must point out that I am not a Sudoku fanatic and not trying to present myself as someone who knows much about Sudoku puzzles. There are plenty of websites that do that with a huge amount of detail and attention. I am merely demonstrating how SQL can be used to solve an everyday problem.

The algorithms I have used in my solution were inspired by a very interesting article on Sudoku puzzle solving written by Paul Stephens, a UK Computer Journalist.

The Concepts

The approach used to solve a Sudoku puzzle with a computer is not as complex as you might think. However, it is probably not one you would necessarily contemplate using to solve the puzzle manually, or at least until you get stuck. The approach is based on determining all possible candidates for a particular cell and then applying some basic rules and pattern matching to eliminate redundant candidates. As candidates get removed, remaining candidates become certainties, either by being:

The remaining candidate in a cell. The candidate with the last remaining value in a box, row or column.

So the algorithms are doing two jobs; identifying certainties; and eliminating candidates. It is a just a case of iterating through these two steps applying the available algorithms, until no more candidates remain.It is worth stressing that the solver does not support guessing, so some tricky puzzles may not be solvable, though there is an argument that a Sudoku puzzle should be solvable without guessing.Solution Design

I have chosen SQL and Oracle PL/SQL to solve the puzzle, firstly because it is the programming language I use every day, and secondly SQL is probably the most suited language for implementing set theory solutions. The solver is implemented into a single stored procedure supported by a number internal procedures and functions. In hind sight, it would have been probably more correct to have developed it as a package, but since it is not really loosing out on the benefits gained by packages, I have kept it as a stored procedure, unless it evolves into something more sophisticated.

The algorithms are written purely in SQL, and PL/SQL is only used to control the iterations and solution output. Any other procedural language could be used to implement this solution, such as java, vb, php, Delphi, etc, with the same SQL embedded. However, the SQL itself is Oracle specific (mainly because of the DECODE function), but may adapt quite easily to other flavours of SQL if they can support the same degree of sub-querying.

Being a SQL solution the design consists of some database tables. It is a very simple design with one table holding dynamic data i.e. the answers table, and the others hold static reference data such as cells and combinations.

Puzzle Numbers - This table holds a list of all the possible numbers in a cell. For a standard Sudoku puzzle, this is only 9 rows holding values 1 to 9.

Cells - This table holds an entry for each cell in the puzzle using row id (horizontal line of 9 cells) and column id (vertical line of 9 cells) as the primary key, and box id (3x3 cells) as an attribute, e.g. cell 2,3 is in box 1; cell 6,6 is box 5, etc.

The table only holds 81 rows of data for a standard Sudoku puzzle. The table definition is:

Name Null? Type----------------------- -------- ----------------ROW_ID NOT NULL NUMBERCOL_ID NOT NULL NUMBERBOX_ID NOT NULL NUMBER

Combinations - This table only contains static data and used for the set based algorithm. It lists every member number of a given combination set. A combination set is a group of unique numbers that form a set with n members, e.g. numbers 3, 5, and 7 belong to the set "357", and 5, 6, and 8 all belong to the "568" set, 4 and 6 belong to the "46" set and so on. Since these are not permutations, i.e. where the order of the numbers is significant, set "234" is the same as "432", so in order to keep set identifier consistent, it always represented with numbers in sequence, i.e. use "234" rather than "432".

Name Null? Type----------------------- -------- ----------------SET_ID NOT NULL NUMBERPUZZLE_NUMBER NOT NULL NUMBER SET_SIZE NOT NULL NUMBER

The primary key of this table is the combination of set id and member number. The set size is there to filter on combination sets of a particular size. These are some examples of the combinations table entries:

Set Id

Puzzle Number

Set Size

34

3

2

34

4

2

123

1

3

123

2

3

123

3

3

124

1

3

124

2

3

124

4

3

There is an entry for every possible set of combined numbers, from sets of 2 numbers through to 5 numbers, i.e. 12 through to 56789.

Answers - This table is the only dynamic table and holds the starting numbers to the puzzle plus all initial candidates to any cells not containing a solved number.

Name Null? Type----------------------- -------- ----------------PUZZLE_ID NOT NULL NUMBERSTEP_NO NOT NULL NUMBERROW_ID NOT NULL NUMBERCOL_ID NOT NULL NUMBERPENCIL_MARK_IND NOT NULL NUMBERANSWER NOT NULL NUMBERBOX_ID NUMBER

Candidates are referred to as pencil marks, where a pencil mark goes through several states as indicated by the pencil_mark_ind flag, which has the following values:

-1

The pencil mark has been rubbed out.

0

The pencil mark is now a certainty (all starting numbers are set to this value).

1

Normal pencil mark setting. i.e. candidate.

2

A temporary pencil mark setting identified by an algorithm step. This can then be used to identify other candidates to be rubbed out.

The initial puzzle clues (which have a pencil mark = 0) are loaded via an Excel spreadsheet that I have put together which generates a series of insert statements from numbers input into a grid. The data is inserted into a staging table called puzzle_load, and copied to the answers table during the execution of the solver.

Algorithms

The solver only uses four different algorithms. A typical "The Times Fiendish" Sudoku puzzle usually needs to use all of the algorithms to be solved, however "The Times Difficult" can be solved usually with just using the first two.

The algorithms are split into two categories - those that identify certainties in cells (first two) and those that eliminate candidates (last two).

The algorithms are explained in more detail in the following pages:

1. Singles - Cell

2. Singles -Box

3. Cross Hatching

4. Partial Members Set

The iterations of these algorithms are not applied strictly in the sequence shown above. The Singles algorithms are always applied when any candidates have been removed by the other algorithms. Since the singles algorithms can remove candidates themselves, the algorithm is reiterated until no more certainties are identified. Then the next algorithm in sequence can be applied and iterated continuously until no more candidates are eliminated, followed by the singles algorithms again.

Conclusion

This Sudoku Solver solution has resulted in a fairly simple design and very little coding, and demonstrates how powerful SQL is in solving complex problems. It is far too easy for a programmer to go off and develop a solution using a procedural based programming language, one that they may be more familiar with or one they consider more in fashion with the latest technological preachings, than attempting to push the programming logic into SQL. I am not saying all problems are better solved with SQL, but I often find solutions far more complex than they need to be, mainly because of the reluctance to use SQL efficiently and effectively, or reluctance to accept that SQL is a superior programming option.

Oracle PL/SQL Sudoku Solver - Algorithms

1. Singles - cell

If there is only a single remaining pencil mark in a cell, then it must be a certainty. Any other candidates with the same value in the same box, row or column can be eliminated.

In the above example, the number 7 in red is a single candidates in a cell. The blue numbers can then be rubbed out.

This is the SQL statement used to identify the candidates that meet the criteria.

UPDATE answersSET pencil_mark_ind = 0WHERE pencil_mark_ind > 0AND puzzle_id = p_puzzle_idAND (row_id, col_id) IN ( SELECT a.row_id, a.col_id FROM answers a WHERE a.puzzle_id = p_puzzle_id AND a.pencil_mark_ind > 0 GROUP BY a.row_id, a.col_id HAVING COUNT(*) = 1);

Each time new certainties are found, then any other candidates remaining in the cell can be rubbed out, and any candidates with the same value in the same box, row or column, but different cell can also be rubbed out.

This eliminates the candidates in the same cell...

UPDATEanswersaSETpencil_mark_ind=-1WHEREa.pencil_mark_ind>0ANDa.puzzle_id=p_puzzle_idAND(a.row_id,a.col_id)IN(SELECTa2.row_id,a2.col_idFROManswersa2WHEREa2.pencil_mark_ind=0ANDa2.puzzle_id=p_puzzle_id);

This eliminates the candidates in the same box, row, or column (iterated three times for box, row, and column)...

FORiIN1..3LOOPUPDATEanswersaSETa.pencil_mark_ind=-1WHEREa.pencil_mark_ind>0ANDa.puzzle_id=p_puzzle_idAND(a.answer,DECODE(i,1,a.box_id,2,a.row_id,3,a.col_id))IN(SELECTa2.answer,DECODE(i,1,a2.box_id,2,a2.row_id,3,a2.col_id)FROManswersa2WHEREa2.puzzle_id=p_puzzle_idANDa2.pencil_mark_ind=0AND(a2.row_id!=a.row_idORa2.col_id!=a.col_id));ENDLOOP;

2. Singles - box

If there is only a single remaining pencil mark in a box for a given number (there may be other pencil mark in its cell), then it must be a certainty, and all other pencil marks in that cell can be rubbed out. All other pencil marks in the same box, row, or column can also be rubbed out.

In the above example the numbers in red are single candidates in a box. The blue numbers can then be rubbed out.

UPDATE answersSET pencil_mark_ind = 0WHERE pencil_mark_ind > 0AND puzzle_id = p_puzzle_idAND (box_id, answer) IN ( SELECT a.box_id, a.answer FROM answers a WHERE a.puzzle_id = p_puzzle_id AND a.pencil_mark_ind > 0 GROUP BY a.box_id, a.answer HAVING COUNT(*) = 1);

Like with the Singles Cell algorithm, if any certainties have been identified, then any candidates that can be eliminated need to be rubbed out using the same two steps described before.

Just by iterating through the Singles - Cell and Singles - Box steps over and over again, it possible to solve a good proportion of the puzzle (see below). Anything less than a "The Times Fiendish" will usually be completely solved with these two algorithms.

The above shows how far the above puzzle can get just on these two algorithms.

3. Pencil Mark Cross Hatching

For each candidate pair or candidate triple (of the same value forming a row or column in a box, and not appearing in any other cell in the box), eliminate the candidates of the same value in the same row or column but in a different box.

The numbers in red are examples of candidate pairs in a box. They must be the only two in a box and form a line either in a column or a row. The numbers in blue are the candidates that can be eliminated.

The algorithm also applies to candidate triples in a box forming a line in either or row or column.

Each time numbers are eliminated, then first two singles algorithms need to be applied to identify certainties. With the above example, it is now possible to iterate through just the singles algorithms only and solve the rest of the puzzle.

The update is performed once for pairs (n=2) and once for triples (n=3). This statement will mark the candidate pairs and triples with the temporary pencil mark value of 2.

UPDATE answersSET pencil_mark_ind = 2WHERE pencil_mark_ind = 1AND puzzle_id = p_puzzle_idAND (box_id, answer) IN ( SELECT a.box_id, a.answer FROM answers a WHERE a.pencil_mark_ind > 0 AND a.puzzle_id = p_puzzle_id GROUP BY a.box_id, a.answer HAVING COUNT(*) = n AND ( MIN(a.col_id) = MAX(a.col_id) OR MIN(a.row_id) = MAX(a.row_id) ) );

Immediately after each update above, affected candidates can be eliminated (by setting the pencil mark to -1, i.e. rubbed out). The statement is looped through twice, once for rows, and once for columns...

FORiIN1..2LOOPUPDATEanswersaSETpencil_mark_ind=-1WHEREa.pencil_mark_ind>0ANDa.puzzle_id=p_puzzle_idAND(DECODE(i,1,a.row_id,2,a.col_id),a.answer)IN(SELECTDECODE(i,1,a2.row_id,2,a2.col_id),a2.answerFROManswersa2WHEREa2.pencil_mark_ind=2ANDa2.puzzle_id=p_puzzle_idANDa2.box_id!=a.box_idGROUPBYa2.box_id,decode(i,1,a2.row_id,2,a2.col_id),a2.answerHAVINGCOUNT(*)=n);ENDLOOP;

After the candidates have been removed, then the Singles algorithms can be applied and iterated continuously until no more certainties can be identified.

4. Partial Member Sets

This algorithm works with sets of unique candidate numbers. A set has n members of unique numbers. The algorithm looks for n cells in a box, row or column containing only members of the set, whether a partial set or full set. When it finds n cells, with up to n set members in a cell, and no non-members in the cell, it can then eliminate any other occurrence of the numbers in other cells.

The value of n can range from 2 upwards, but a maximum set size of 5 seems to be the most practical limit.

For example, a row could have three cells containing the candidates: 5,7 ; 3,5 ; and 3,7 (see number in red below). These all belong to set 357. Candidates in other cells with the same value in the same row (see numbers in blue below) can be eliminated.

Like with the previous algorithm, once candidates have been removed, then the Singles algorithms need to be applied to identify any certainties.

Once no more three member sets can be found, then four member sets can be identified, and so on until five member sets. This puzzle has an example of a four member set (1469) in a column (see below).

This puzzle example will now completely solve itself just by iterating through the singles algorithms - starting with the 9 highlighted in green.

This is the SQL to identify the partially complete sets.

UPDATEanswersSETpencil_mark_ind=2WHEREpencil_mark_ind>0ANDpuzzle_id=p_puzzle_idAND(row_id,col_id)IN(SELECT/*-------------------------STEPE:Thissub-queryidentifiesthecellswhichareentirelymadeupofcandidatesbelongingtothesamesetspreadacrossncellsinabox/row/column.-------------------------*/a.row_id,a.col_idFROM(SELECT/*------------------------- STEPC:Thisin-lineviewidentifiesallthepossiblesets(combinations)ofsizenwithinacellandthatsetappearinguptoandincludingthemostnumberofcandidatesidentifiedinstepA.-------------------------*/a.box_id,a.row_id,a.col_id,pn.set_id,COUNT(*)cntFROMcombinationspn,(SELECT/*------------------------STEPB:Thisin-lineviewretrievesallthecandidatesfromthecellshavingexactlynorlesscandidates-------------------------*/ b.box_id,b.row_id,b.col_id,a.cnt,b.answerFROM(SELECT/*------------------ STEPA:Thisin-lineview identifiesthose cellswhichhaveexactlyn(setsize) orlesscandidates inthecell.-----------------*/ a.box_id,a.row_id,a.col_id,COUNT(*)cntFROManswersaWHEREa.puzzle_id=p_puzzle_idANDa.pencil_mark_ind>0GROUPBYa.box_id,a.row_id,a.col_idHAVINGCOUNT(*)0)aWHEREpn.set_size=nANDa.answer=pn.puzzle_numberGROUPBYa.box_id,a.row_id,a.col_id,pn.set_idHAVINGCOUNT(*)=MAX(a.cnt))a,(SELECT/*--------------------------- STEPD:Thisin-lineview repeatsstepsA,BandCtoidentifythecellsagain,andthenselectsthebox/row/columnwhichhasthesetsappearinginncellsinthatbox/row/column-----------------------------*/ DECODE(p_type,1,a.box_id,2,a.row_id,3,a.col_id)id,a.set_idFROM(SELECTa.box_id,a.row_id,a.col_id,pn.set_id,COUNT(*)cntFROMcombinationspn,(SELECTb.box_id,b.row_id,b.col_id,a.cnt,b.answerFROM(SELECT a.box_id, a.row_id,a.col_id,COUNT(*)cntFROManswersaWHEREa.puzzle_id=p_puzzle_idANDa.pencil_mark_ind>0GROUPBYa.box_id,a.row_id,a.col_idHAVINGCOUNT(*)0)aWHEREpn.set_size=nANDa.answer=pn.puzzle_numberGROUPBYa.box_id,a.row_id,a.col_id,pn.set_idHAVINGCOUNT(*)=MAX(a.cnt))aGROUPBYDECODE(p_type,1,a.box_id,2,a.row_id,3,a.col_id),a.set_idHAVINGCOUNT(*)=n)sWHEREs.id=DECODE(p_type,1,a.box_id,2,a.row_id,3,a.col_id)ANDs.set_id=a.set_id);

To rub out the affected candidates use this statement for box, row, and column...

UPDATEanswersaSETpencil_mark_ind=-1WHEREa.pencil_mark_ind>0ANDa.pencil_mark_ind!=2ANDa.puzzle_id=p_puzzle_idAND(DECODE(p_type,1,a.box_id,2,a.row_id,3,a.col_id),a.answer)IN(SELECTDECODE(p_type,1,a2.box_id,2,a2.row_id,3,a2.col_id),a2.answerFROM answersa2WHERE a2.pencil_mark_ind=2AND a2.puzzle_id=p_puzzle_id);

After the candidates have been removed, then the Singles algorithms (1 and 2) can be applied and iterated continuously until no more certainties can be found.