I love puzzles. So when I heard about the NoCoug SQL Challenge I felt tempted to give it a go.
The Northern California Oracle Users Group (NoCoug) has challenged us to find a good way to calculate the probability of getting different sums for x throws of a n-sided die using only SQL. The probabilities for the faces of a single die are stored in a table and that’s all you need to start playing with the problem. The SQL Challenge rules can be found on the NoCoug website, along with some other relevant information.
After working out my very first solution, I read the rules and found it wasn’t fit for the challenge, as it used non-SQL extensions (SQL*Plus). So I started again, this time using pure SQL. I came up with a few options but wasn’t happy with them from a performance perspective. They needed more sweating.
I find that walking is very good for thinking. Whenever I can, and when weather permits, I walk home from work at the end of the day. The distance between work and home is about 6 km, which takes me around one hour to cover. After you’ve done it a few times the walking becomes automatic and you don’t have to think about it anymore; obstacles, kerbs, corners, street-crossings are all handled in auto-pilot mode. Then, as you don’t have anything else to do, you think.
During the days I was working on the challenge solution, the SQL query used to occupy my thoughts along most of my (almost) daily walk. I decided to use only standard SQL features and wanted to be able to run the solution on both Oracle and SQL Server. While walking, I started thinking on ways to improve my initial solution’s performance without having to resort to non-standard tricks.
Joins of joins
The most straightforward way to calculate the probabilities for the possible sums in x throws of the die (whose face’s probabilities are stored in the die
table) is to write a x-way self-cross-join of the table. For example, the probabilities for the sums in 3 throws of the die is given by:
select d1.face_value + d1.face_value + d2.face_value total_value, sum(d1.probability * d2.probability * d3.probability) probability from die d1 cross join die d2 cross join die d3 group by d1.face_value + d1.face_value + d2.face_value
The problem is that the complexity of the query escalates when a high number of throws is involved; imagine writing and executing a 1024 join of that table. Also, the number of rows generated by the multiple cross-joins increases exponentially every time a new join is added to the query. It’s just not feasible to run the query above for a large number of throws. For a 20-sided die, for example, the query above yields 58 rows after the GROUP BY
aggregation (totals ranging from 3 to 60). Before the aggregation can be performed, though, all the joins must be calculated, which generates 203
= 8000 combinations.
To improve performance, I needed a way to reduce the number of joins required to calculate the probabilities and the cost of those joins.
Reusing calculations
Let’s suppose I pre-calculate the probabilities for n throws of the die. To calculate the chances for n+1 throws, I don’t need to start from scratch again—I can start from the probabilities for n throws and perform a Cartesian product (or a “cross-join”, in other words) of that, with the probabilities for one die. Similarly, to calculate the probability for 2n throws, I can use the probabilities for n throws and do a “self-cross-join.”
Using this principle, I thought I could pre-calculate probabilities for some chosen number of throws, and then use those results as building blocks to obtain the probabilities for any other numbers. The most obvious “sizes” for the building blocks would be powers of 2, since it’s easy to find a combination of them to generate any other number using binary arithmetics.
Thus, if the probabilities for 1, 2, 4, 8, … 2n throws of a die are pre-calculated, we can easily combine them to calculate probabilities for any number of throws up to 2(n+1)-1.
Combining the building blocks
I wanted to use those building blocks together in a single parameterised SQL so I could generate probabilities for any given number of throws without having to rewrite the query every time. To do that, I used some binary arithmetic and logic to manipulate the joins between the building blocks.
Let’s say T1, T2 and T4 are the pre-calculated probabilities for the throw of 1, 2, and 4 dice, respectively. The probability for 7 throws of the die is given by:
T1 cross join T2 cross join T4
Or, alternatively:
(select face_value, probability from T1) cross join (select face_value, probability from T2) cross join (select face_value, probability from T4)
The query above, though, is not parameterised and only calculates probabilities for 7 throws of the die. Using a parameter (&&throws
) we can manipulate which building blocks will be involved in the join:
(select face_value, probability from T1 where bitand(1, &&throws) != 0) cross join (select face_value, probability from T2 where bitand(2, &&throws) != 0) cross join (select face_value, probability from T4 where bitand(4, &&throws) != 0)
The query is now parameterised. We have a big problem, however: for &&throws = 2
, for example, bitand(1, &&throws) = 0
, which causes the first select to return no rows, which in consequence causes the whole cross-join to return no rows. And this is definitely not what we want.
To avoid this problem, whenever one of the building blocks is not to be used, instead of just suppressing it, we need to replace it with a neutral element—an element that could be used in the cross-join without altering the values. This element is the select shown below:
select 0 face_value, 1 probability from dual
The value 0
for face_value
won’t change the sums for the dice faces, and the value 1
for probability won’t alter the probability products calculate in the cross join. Adding this element to the previous query we then have:
(select face_value, probability from T1 where bitand(1, &&throws) != 0 union all select 0 face_value, 1 probability from dual where bitand(1, &&throws) = 0) cross join (select face_value, probability from T2 where bitand(2, &&throws) != 0 union all select 0 face_value, 1 probability from dual where bitand(2, &&throws) = 0) cross join (select face_value, probability from T4 where bitand(4, &&throws) != 0 union all select 0 face_value, 1 probability from dual where bitand(4, &&throws) = 0)
With this query, setting the &&throws
parameter to any value ranging between 1 and 7 will correctly choose which of the pre-calculated building blocks should be included in the calculation. With one query and one parameter we can then have probabilities calculated for 7 different numbers of throws.
Reducing cardinality
Another factor affecting the performance of the probabilities calculation is the cardinality of the subsequent joins, which escalate exponentially. With a 20-sided die, for example, after n throws you’ll have 20n rows in the result set. For 3 throws, that means 8000 rows in the resulting set.
For 3 throws of a 20-sided die, though, we can only have sums ranging from 3 (1+1+1) to 60 (20+20+20). That means that, in those 8000 rows we have only 58 distinct summed values. Probabilities for any given sum may appear several times because it may be obtained by summing different combinations of the die’s values. For example, using a 20-sided die, the sum 18 can be obtained in the following ways (among many, many others): 1+1+16, 1+2+15, 1+3+14, etc.
To help performance, I decided to perform an aggregation (GROUP BY
) after every join to reduce the number of rows before the execution of the subsequent join. The aggregation operation itself has a cost but it pays off, preventing the cardinality of the joins from increasing exponentially.
Applying this idea to the query above we have:
(select face_value, probability from T1 where bitand(1, &&throws) != 0 union all select 0 face_value, 1 probability from dual where bitand(1, &&throws) = 0) cross join (select T2.face_value + T4.face_value face_value, sum(T2.probability * T4.probability) probability from ( (select face_value, probability from T2 where bitand(2, &&throws) != 0 union all select 0 face_value, 1 probability from dual where bitand(2, &&throws) = 0) cross join (select face_value, probability from T4 where bitand(4, &&throws) != 0 union all select 0 face_value, 1 probability from dual where bitand(4, &&throws) = 0) ) group by T2.face_value + T4.face_value )
The query
The final solution is the combination of the techniques described above. The query was written with standard SQL and makes use of Subquery Factoring (a.k.a. Common Table Expressions, a.k.a. the WITH clause) to calculate the building blocks’ probabilities. The probabilities for a 2n-throws building block is calculated based on the probabilities previously calculated for 2(n-1).
Limitations
Each written query, like the one below, can calculate probabilities up to a certain number of throws (max # of throws = 2Q - 1
, where Q is the number of T* subqueries used in the query). This number can be easily increased by adding new subqueries to the query. The maximum number of throws increases exponentially for linear increases in the size of the query.
When testing on Oracle (10.2.0.4 and 11.1.0.7), I found that the maximum Q I could use for a successful execution was 9 (query below). For queries with more than 9 subqueries I got an ORA-600. I couldn’t find an existing solution for the problem in Metalink.
Code
-- First International NoCoug SQL Challenge -- Author: Andre Araujo ([email protected]) -- Objective: Calculate the probability of getting certain sums for N throws of a dice. undefine throws with T1 as ( -- Results for the throw of 1 die select d1.face_value face_value, sum(d1.probability) probability from die d1 group by d1.face_value ) , T2 as ( -- Results for the throw of 2 dice select d1.face_value + d2.face_value face_value, sum(d1.probability * d2.probability) probability from T1 d1 cross join T1 d2 group by d1.face_value + d2.face_value ) , T4 as ( -- Results for the throw of 4 dice select d1.face_value + d2.face_value face_value, sum(d1.probability * d2.probability) probability from T2 d1 cross join T2 d2 group by d1.face_value + d2.face_value ) , T8 as ( -- Results for the throw of 8 dice select d1.face_value + d2.face_value face_value, sum(d1.probability * d2.probability) probability from T4 d1 cross join T4 d2 group by d1.face_value + d2.face_value ) , T16 as ( -- Results for the throw of 16 dice select d1.face_value + d2.face_value face_value, sum(d1.probability * d2.probability) probability from T8 d1 cross join T8 d2 group by d1.face_value + d2.face_value ) , T32 as ( -- Results for the throw of 32 dice select d1.face_value + d2.face_value face_value, sum(d1.probability * d2.probability) probability from T16 d1 cross join T16 d2 group by d1.face_value + d2.face_value ) , T64 as ( -- Results for the throw of 64 dice select d1.face_value + d2.face_value face_value, sum(d1.probability * d2.probability) probability from T32 d1 cross join T32 d2 group by d1.face_value + d2.face_value ) , T128 as ( -- Results for the throw of 128 dice select d1.face_value + d2.face_value face_value, sum(d1.probability * d2.probability) probability from T64 d1 cross join T64 d2 group by d1.face_value + d2.face_value ) , T256 as ( -- Results for the throw of 256 dice select d1.face_value + d2.face_value face_value, sum(d1.probability * d2.probability) probability from T128 d1 cross join T128 d2 group by d1.face_value + d2.face_value ) -- The select below combines the results above to achieve the wanted results. -- The bitand checks ensure only necessary selects are executed. select T1.face_value + T2.face_value face_value, sum(T1.probability * T2.probability) probability from (select face_value, probability from T1 where bitand(1, &&throws) != 0 union all select 0, 1 from dual where bitand(1, &&throws) = 0) T1 cross join (select T2.face_value + T4.face_value face_value, sum(T2.probability * T4.probability) probability from (select face_value, probability from T2 where bitand(2, &&throws) != 0 union all select 0, 1 from dual where bitand(2, &&throws) = 0) T2 cross join (select T4.face_value + T8.face_value face_value, sum(T4.probability * T8.probability) probability from (select face_value, probability from T4 where bitand(4, &&throws) != 0 union all select 0, 1 from dual where bitand(4, &&throws) = 0) T4 cross join (select T8.face_value + T16.face_value face_value, sum(T8.probability * T16.probability) probability from (select face_value, probability from T8 where bitand(8, &&throws) != 0 union all select 0, 1 from dual where bitand(8, &&throws) = 0) T8 cross join (select T16.face_value + T32.face_value face_value, sum(T16.probability * T32.probability) probability from (select face_value, probability from T16 where bitand(16, &&throws) != 0 union all select 0, 1 from dual where bitand(16, &&throws) = 0) T16 cross join (select T32.face_value + T64.face_value face_value, sum(T32.probability * T64.probability) probability from (select face_value, probability from T32 where bitand(32, &&throws) != 0 union all select 0, 1 from dual where bitand(32, &&throws) = 0) T32 cross join (select T64.face_value + T128.face_value face_value, sum(T64.probability * T128.probability) probability from (select face_value, probability from T64 where bitand(64, &&throws) != 0 union all select 0, 1 from dual where bitand(64, &&throws) = 0) T64 cross join (select T128.face_value + T256.face_value face_value, sum(T128.probability * T256.probability) probability from (select face_value, probability from T128 where bitand(128, &&throws) != 0 union all select 0, 1 from dual where bitand(128, &&throws) = 0) T128 cross join (select face_value, probability from T256 where bitand(256, &&throws) != 0 union all select 0, 1 from dual where bitand(256, &&throws) = 0) T256 -- -- Grouping after each join ensures rows with equal sums are combined, reducing the cardinality for the next join. -- group by T128.face_value + T256.face_value) T128 group by T64.face_value + T128.face_value) T64 group by T32.face_value + T64.face_value) T32 group by T16.face_value + T32.face_value) T16 group by T8.face_value + T16.face_value) T8 group by T4.face_value + T8.face_value) T4 group by T2.face_value + T4.face_value) T2 group by T1.face_value + T2.face_value order by 1;
Portability
The code above doesn’t use any special, non-standard SQL features, and can be easily ported to other databases with minor changes.
Running it against SQL Server, for example, all you need to do is to replace the BITAND
function with the bitwise AND
operator, and replace the use of the substitution variable with a TSQL variable.
Testing
I ran the tests on an Oracle 10.2.0.4 instance. The results looked quite good and the solution seemed to scale as expected. I compared it to the performance of n straightforward joins of the die
table (using GROUP BY
s to reduce cardinality as explained above) and the results were quite similar.
The proposed solution, though, has the advantage of reduced size of the query, and that the same query can be used to calculated the probabilities for a different number of throws.
The results below were run for a 20-sided die. I ran tests for all the number of throws between 1 and 511. The values shown are the elapsed time average over 10 runs of the query for each number of throws. The complete test took 30 hours to complete.
15 Comments. Leave new
Wow. Very clean and efficient.
[…] Pythian’s André Araujo responds to a different test with his very elegant NoCoug SQL Challenge Entry. […]
Thanks, Waldar. Looking forward to seeing more solutions submitted. There’s been already some very interesting ones.
[…] Chen Shapira wrote a summary about this challenge. In the comments i saw a trackback (or pingback, whatever it is) leading to André Aurajo’s solution. […]
I love this solution because it is so easy to understand! Really a fantastic piece of code and impressive explanation, congrats !
Thanks, Laurent! I appreciate the feedback.
[…] ninth solution was sent in by André Araujo from Australia who used binary arithmetic and common table expressions […]
[…] ninth solution—by André Araujo from Australia—used binary arithmetic and common table expressions to solve […]
[…] I’m the seventh in the list there. As I told before, my favorite solution to this problem is André Aurajo’s one. […]
The first argument of the ORA-600 error is 15160. The ORA-600 troubleshooting tool in Metalink says that Oracle is “attempting to find a suitable table as the starting point in a complex join operation.” The real problem is that the estimated cost is too high for the optimizer to handle. The workaround is to introduce a Cardinality hint that instructs the query optimizer that the Die table has only one row :-) You will then be able to handle much higher values of the Throws variable.
T1 as (
— Results for the throw of 1 die
select /*+ CARDINALITY(d1 1) */ d1.face_value face_value, sum(d1.probability) probability
from die d1
group by d1.face_value
)
Regards,
Iggy
P.S. The contest is now closed and the winners will be announnced soon.
[…] wins the challenge for his wonderful solution using Discrete Fourier Transforms; the runner-up is André Araujo from Australia, who used binary arithmetic and common table expressions in his […]
[…] ninth solution—by André Araujo from Australia—used binary arithmetic and common table […]
[…] think his solution is the one I would have found by myself ! I’m sure the same clever logic could be reused for more problems […]
[…] is my own solution to the problem. It is a variant of André Araujo’s solution; it is equally fast but has no logical limitations. However, it does have a hidden physical […]
The Second International NoCOUG SQL Challenge has been published in the February 2011 issue of the NoCOUG Journal. The Journal can be downloaded from https://bit.ly/gVNZsW. SQL commands for creating the required data are available at https://bit.ly/g58WVn.