# NoCoug SQL Challenge Entry – May 15

Posted in: Technical Track

I love puzzles. So when I heard about the NoCoug SQL Challenge I felt tempted to give it a go.

The Northern California Oracle Users Group (NoCoug) has challenged us to find a good way to calculate the probability of getting different sums for x throws of a n-sided die using only SQL. The probabilities for the faces of a single die are stored in a table and that’s all you need to start playing with the problem. The SQL Challenge rules can be found on the NoCoug website, along with some other relevant information.

After working out my very first solution, I read the rules and found it wasn’t fit for the challenge, as it used non-SQL extensions (SQL*Plus). So I started again, this time using pure SQL. I came up with a few options but wasn’t happy with them from a performance perspective. They needed more sweating.

I find that walking is very good for thinking. Whenever I can, and when weather permits, I walk home from work at the end of the day. The distance between work and home is about 6 km, which takes me around one hour to cover. After you’ve done it a few times the walking becomes automatic and  you don’t have to think about it anymore; obstacles, kerbs, corners, street-crossings are all handled in auto-pilot mode. Then, as you don’t have anything else to do, you think.

During the days I was working on the challenge solution, the SQL query used to occupy my thoughts along most of my (almost) daily walk. I decided to use only standard SQL features and wanted to be able to run the solution on both Oracle and SQL Server. While walking, I started thinking on ways to improve my initial solution’s performance without having to resort to non-standard tricks.

### Joins of joins

The most straightforward way to calculate the probabilities for the possible sums in x throws of the die (whose face’s probabilities are stored in the `die` table) is to write a x-way self-cross-join of the table. For example, the probabilities for the sums in 3 throws of the die is given by:

```select
d1.face_value + d1.face_value + d2.face_value total_value,
sum(d1.probability * d2.probability * d3.probability) probability
from
die d1
cross join die d2
cross join die d3
group by d1.face_value + d1.face_value + d2.face_value
```

The problem is that the complexity of the query escalates when a high number of throws is involved; imagine writing and executing a 1024 join of that table. Also, the number of rows generated by the multiple cross-joins increases exponentially every time a new join is added to the query. It’s just not feasible to run the query above for a large number of throws. For a 20-sided die, for example, the query above yields 58 rows after the `GROUP BY` aggregation (totals ranging from 3 to 60). Before the aggregation can be performed, though, all the joins must be calculated, which generates `203` = 8000 combinations.

To improve performance, I needed a way to reduce the number of joins required to calculate the probabilities and the cost of those joins.

### Reusing calculations

Let’s suppose I pre-calculate the probabilities for n throws of the die. To calculate the chances for n+1 throws, I don’t need to start from scratch again—I can start from the probabilities for n throws and perform a Cartesian product (or a “cross-join”, in other words) of that, with the probabilities for one die. Similarly, to calculate the probability for 2n throws, I can use the probabilities for n throws and do a “self-cross-join.”

Using this principle, I thought I could pre-calculate probabilities for some chosen number of throws, and then use those results as building blocks to obtain the probabilities for any other numbers. The most obvious “sizes” for the building blocks would be powers of 2, since it’s easy to find a combination of them to generate any other number using binary arithmetics.

Thus, if the probabilities for 1, 2, 4, 8, … 2n throws of a die are pre-calculated, we can easily combine them to calculate probabilities for any number of throws up to 2(n+1)-1.

### Combining the building blocks

I wanted to use those building blocks together in a single parameterised SQL so I could generate probabilities for any given number of throws without having to rewrite the query every time. To do that, I used some binary arithmetic and logic to manipulate the joins between the building blocks.

Let’s say T1, T2 and T4 are the pre-calculated probabilities for the throw of 1, 2, and 4 dice, respectively. The probability for 7 throws of the die is given by:

```            T1
cross join
T2
cross join
T4
```

Or, alternatively:

```            (select face_value, probability from T1)
cross join
(select face_value, probability from T2)
cross join
(select face_value, probability from T4)
```

The query above, though, is not parameterised and only calculates probabilities for 7 throws of the die. Using a parameter (`&&throws`) we can manipulate which building blocks will be involved in the join:

```            (select face_value, probability from T1
where bitand(1, &&throws) != 0)
cross join
(select face_value, probability from T2
where bitand(2, &&throws) != 0)
cross join
(select face_value, probability from T4
where bitand(4, &&throws) != 0)
```

The query is now parameterised. We have a big problem, however: for `&&throws = 2`, for example, `bitand(1, &&throws) = 0`, which causes the first select to return no rows, which in consequence causes the whole cross-join to return no rows. And this is definitely not what we want.

To avoid this problem, whenever one of the building blocks is not to be used, instead of just suppressing it, we need to replace it with a neutral element—an element that could be used in the cross-join without altering the values. This element is the select shown below:

`select 0 face_value, 1 probability from dual`

The value `0` for `face_value` won’t change the sums for the dice faces, and the value `1` for probability won’t alter the probability products calculate in the cross join. Adding this element to the previous query we then have:

```            (select face_value, probability     from T1
where bitand(1, &&throws) != 0
union all
select 0 face_value, 1 probability from dual
where bitand(1, &&throws) = 0)
cross join
(select face_value, probability     from T2
where bitand(2, &&throws) != 0
union all
select 0 face_value, 1 probability from dual
where bitand(2, &&throws) = 0)
cross join
(select face_value, probability     from T4
where bitand(4, &&throws) != 0
union all
select 0 face_value, 1 probability from dual
where bitand(4, &&throws) = 0)
```

With this query, setting the `&&throws` parameter to any value ranging between 1 and 7 will correctly choose which of the pre-calculated building blocks should be included in the calculation. With one query and one parameter we can then have probabilities calculated for 7 different numbers of throws.

### Reducing cardinality

Another factor affecting the performance of the probabilities calculation is the cardinality of the subsequent joins, which escalate exponentially. With a  20-sided die, for example, after n throws you’ll have 20n rows in the result set. For 3 throws, that means 8000 rows in the resulting set.

For 3 throws of a 20-sided die, though, we can only have sums ranging from 3 (1+1+1) to 60 (20+20+20). That means that, in those 8000 rows we have only 58 distinct summed values. Probabilities for any given sum may appear several times because it may be obtained by summing different combinations of the die’s values. For example, using a 20-sided die, the sum 18 can be obtained in the following ways (among many, many others): 1+1+16, 1+2+15, 1+3+14, etc.

To help performance, I decided to perform an aggregation (`GROUP BY`) after every join to reduce the number of rows before the execution of the subsequent join. The aggregation operation itself has a cost but it pays off, preventing the cardinality of the joins from increasing exponentially.

Applying this idea to the query above we have:

```            (select face_value, probability     from T1
where bitand(1, &&throws) != 0
union all
select 0 face_value, 1 probability from dual
where bitand(1, &&throws) = 0)
cross join
(select T2.face_value + T4.face_value face_value,
sum(T2.probability * T4.probability) probability
from (
(select face_value, probability     from T2
where bitand(2, &&throws) != 0
union all
select 0 face_value, 1 probability from dual
where bitand(2, &&throws) = 0)
cross join
(select face_value, probability     from T4
where bitand(4, &&throws) != 0
union all
select 0 face_value, 1 probability from dual
where bitand(4, &&throws) = 0)
)
group by T2.face_value + T4.face_value
)
```

### The query

The final solution is the combination of the techniques described above. The query was written with standard SQL and makes use of Subquery Factoring (a.k.a. Common Table Expressions, a.k.a. the WITH clause) to calculate the building blocks’ probabilities. The probabilities for a 2n-throws building block is calculated based on the probabilities previously calculated for 2(n-1).

### Limitations

Each written query, like the one below, can calculate probabilities up to a certain number of throws (max # of throws =` 2Q - 1`, where Q is the number of T* subqueries used in the query). This number can be easily increased by adding new subqueries to the query. The maximum number of  throws increases exponentially for linear increases in the size of the query.

When testing on Oracle (10.2.0.4 and 11.1.0.7), I found that the maximum Q I could use for a successful execution was 9 (query below). For queries with more than 9 subqueries I got an ORA-600. I couldn’t  find an existing solution for the problem in Metalink.

### Code

```-- First International NoCoug SQL Challenge
-- Author: Andre Araujo ([email protected])
-- Objective: Calculate the probability of getting certain sums for N throws of a dice.
undefine throws
with
T1 as (
-- Results for the throw of 1 die
select d1.face_value face_value, sum(d1.probability) probability
from die d1
group by d1.face_value
)
,
T2 as (
-- Results for the throw of 2 dice
select d1.face_value + d2.face_value face_value, sum(d1.probability * d2.probability) probability
from T1 d1
cross join T1 d2
group by d1.face_value + d2.face_value
)
,
T4 as (
-- Results for the throw of 4 dice
select d1.face_value + d2.face_value face_value, sum(d1.probability * d2.probability) probability
from T2 d1
cross join T2 d2
group by d1.face_value + d2.face_value
)
,
T8 as (
-- Results for the throw of 8 dice
select d1.face_value + d2.face_value face_value, sum(d1.probability * d2.probability) probability
from T4 d1
cross join T4 d2
group by d1.face_value + d2.face_value
)
,
T16 as (
-- Results for the throw of 16 dice
select d1.face_value + d2.face_value face_value, sum(d1.probability * d2.probability) probability
from T8 d1
cross join T8 d2
group by d1.face_value + d2.face_value
)
,
T32 as (
-- Results for the throw of 32 dice
select d1.face_value + d2.face_value face_value, sum(d1.probability * d2.probability) probability
from T16 d1
cross join T16 d2
group by d1.face_value + d2.face_value
)
,
T64 as (
-- Results for the throw of 64 dice
select d1.face_value + d2.face_value face_value, sum(d1.probability * d2.probability) probability
from T32 d1
cross join T32 d2
group by d1.face_value + d2.face_value
)
,
T128 as (
-- Results for the throw of 128 dice
select d1.face_value + d2.face_value face_value, sum(d1.probability * d2.probability) probability
from T64 d1
cross join T64 d2
group by d1.face_value + d2.face_value
)
,
T256 as (
-- Results for the throw of 256 dice
select d1.face_value + d2.face_value face_value, sum(d1.probability * d2.probability) probability
from T128 d1
cross join T128 d2
group by d1.face_value + d2.face_value
)
-- The select below combines the results above to achieve the wanted results.
-- The bitand checks ensure only necessary selects are executed.
select T1.face_value + T2.face_value face_value, sum(T1.probability * T2.probability) probability
from
(select face_value, probability from T1    where bitand(1, &&throws)    != 0 union all select 0, 1 from dual where bitand(1, &&throws)    = 0) T1
cross join
(select T2.face_value + T4.face_value face_value, sum(T2.probability * T4.probability) probability
from (select face_value, probability from T2    where bitand(2, &&throws)    != 0 union all select 0, 1 from dual where bitand(2, &&throws)    = 0) T2
cross join
(select T4.face_value + T8.face_value face_value, sum(T4.probability * T8.probability) probability
from (select face_value, probability from T4    where bitand(4, &&throws)    != 0 union all select 0, 1 from dual where bitand(4, &&throws)    = 0) T4
cross join
(select T8.face_value + T16.face_value face_value, sum(T8.probability * T16.probability) probability
from (select face_value, probability from T8    where bitand(8, &&throws)    != 0 union all select 0, 1 from dual where bitand(8, &&throws)    = 0) T8
cross join
(select T16.face_value + T32.face_value face_value, sum(T16.probability * T32.probability) probability
from (select face_value, probability from T16   where bitand(16, &&throws)   != 0 union all select 0, 1 from dual where bitand(16, &&throws)   = 0) T16
cross join
(select T32.face_value + T64.face_value face_value, sum(T32.probability * T64.probability) probability
from (select face_value, probability from T32   where bitand(32, &&throws)   != 0 union all select 0, 1 from dual where bitand(32, &&throws)   = 0) T32
cross join
(select T64.face_value + T128.face_value face_value, sum(T64.probability * T128.probability) probability
from (select face_value, probability from T64   where bitand(64, &&throws)   != 0 union all select 0, 1 from dual where bitand(64, &&throws)   = 0) T64
cross join
(select T128.face_value + T256.face_value face_value, sum(T128.probability * T256.probability) probability
from (select face_value, probability from T128  where bitand(128, &&throws)  != 0 union all select 0, 1 from dual where bitand(128, &&throws)  = 0) T128
cross join
(select face_value, probability from T256  where bitand(256, &&throws)  != 0 union all select 0, 1 from dual where bitand(256, &&throws)  = 0) T256
--
-- Grouping after each join ensures rows with equal sums are combined, reducing the cardinality for the next join.
--
group by T128.face_value + T256.face_value) T128
group by T64.face_value  + T128.face_value) T64
group by T32.face_value  + T64.face_value)  T32
group by T16.face_value  + T32.face_value)  T16
group by T8.face_value   + T16.face_value)  T8
group by T4.face_value   + T8.face_value)   T4
group by T2.face_value   + T4.face_value)   T2
group by T1.face_value   + T2.face_value
order by 1;
```

### Portability

The code above doesn’t use any special, non-standard SQL features, and can be easily ported to other databases with minor changes.

Running it against SQL Server, for example, all you need to do is to replace the `BITAND` function with the bitwise `AND` operator, and replace the use of the substitution variable with a TSQL variable.

### Testing

I ran the tests on an Oracle 10.2.0.4 instance. The results looked quite good and the solution seemed to scale as expected. I compared it to the performance of n straightforward joins of the `die` table (using `GROUP BY`s to reduce cardinality as explained above) and the results were quite similar.

The proposed solution, though, has the advantage of reduced size of the query, and that the same query can be used to calculated the probabilities for a different number of throws.

The results below were run for a 20-sided die. I ran tests for all the number of throws between 1 and 511. The values shown are the elapsed time average over 10 runs of the query for each number of throws. The complete test took 30 hours to complete. ## Author

• Want to talk with an expert? Schedule a call with our team to get the conversation started. #### André Araújo

DBA since 1998, having worked with Oracle from version 7.3.4 to the latest one. Working at Pythian since 2009. Waldar
May 20, 2009 3:09 pm

Wow. Very clean and efficient.

Log Buffer #147: a Carnival of the Vanities for DBAs | Pythian Group Blog
May 22, 2009 12:44 pm

[…] Pythian’s André Araujo responds to a different test with his very elegant NoCoug SQL Challenge Entry. […] Andre Araujo
May 25, 2009 12:10 am

Thanks, Waldar. Looking forward to seeing more solutions submitted. There’s been already some very interesting ones.

Challenges !!! | Waldar's SQLing and Datawarehousing Place
May 27, 2009 7:34 pm Laurent Schneider
June 2, 2009 4:36 am

I love this solution because it is so easy to understand! Really a fantastic piece of code and impressive explanation, congrats ! Andre Araujo
June 2, 2009 6:07 pm

Thanks, Laurent! I appreciate the feedback.

The First International NoCOUG SQL Challenge: Nine Ways To Change a Lightbulb « So Many Oracle Manuals, So Little Time
July 8, 2009 1:17 am

[…] ninth solution was sent in by André Araujo from Australia who used binary arithmetic and common table expressions […]

Who Should Tune SQL: The DBA or The Developer? « So Many Oracle Manuals, So Little Time
July 12, 2009 2:40 pm

[…] ninth solution—by André Araujo from Australia—used binary arithmetic and common table expressions to solve […]

NoCOUG Got Nine, TSQL Got Eleven at Waldar’s SQLing and Datawarehousing Place
July 19, 2009 9:01 pm

[…] I’m the seventh in the list there. As I told before, my favorite solution to this problem is André Aurajo’s one. […] Iggy Fernandez
July 19, 2009 11:51 pm

The first argument of the ORA-600 error is 15160. The ORA-600 troubleshooting tool in Metalink says that Oracle is “attempting to find a suitable table as the starting point in a complex join operation.” The real problem is that the estimated cost is too high for the optimizer to handle. The workaround is to introduce a Cardinality hint that instructs the query optimizer that the Die table has only one row :-) You will then be able to handle much higher values of the Throws variable.

T1 as (
— Results for the throw of 1 die
select /*+ CARDINALITY(d1 1) */ d1.face_value face_value, sum(d1.probability) probability
from die d1
group by d1.face_value
)

Regards,

Iggy

P.S. The contest is now closed and the winners will be announnced soon.

First International NoCoug SQL Challenge – And the Winner is… « I’m just a simple DBA on a complex production system
July 31, 2009 12:46 am

[…] wins the challenge for his wonderful solution using Discrete Fourier Transforms; the runner-up is André Araujo from Australia, who used binary arithmetic and common table expressions in his […]

Results of the First International NoCOUG SQL Challenge « So Many Oracle Manuals, So Little Time
July 31, 2009 3:45 pm

[…] ninth solution—by André Araujo from Australia—used binary arithmetic and common table […]

First International NoCOUG SQL Challenge Is Over ! at Waldar’s SQLing and Datawarehousing Place
August 1, 2009 9:45 am

[…] think his solution is the one I would have found by myself ! I’m sure the same clever logic could be reused for more problems […] 