Oracle parallel query hints reference – part 4: GBY_PUSHDOWN

Posted in: Oracle, Technical Track

Welcome to part 4 of the series.

The GBY_PUSHDOWN is a very interesting hint, introduced in 10.2 – it doesn’t have the PQ prefix, but nevertheless plays a crucial role in Parallel query plans.

Purpose: When GBY_PUSHDOWN is activated, an extra “aggregation” step is introduced in the plan, before the PQ workers that read the data send the data to the PQ receivers after a re-shuffle.

Example NO PUSHDOWN:

select /*+PARALLEL(4) NO_GBY_PUSHDOWN  */ mod5_id, count(*) from tlarge group by mod5_id;
--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name     | E-Rows |E-Bytes| Cost (%CPU)| E-Time   |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |          |      5 |    15 |   640   (1)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR                 |          |        |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)           | :TQ10001 |      5 |    15 |   640   (1)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
|   3 |    HASH GROUP BY                |          |      5 |    15 |   640   (1)| 00:00:01 |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE                  |          |    100K|   292K|   638   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      PX SEND HASH               | :TQ10000 |    100K|   292K|   638   (0)| 00:00:01 |  Q1,00 | P->P | HASH       |
|   6 |       PX BLOCK ITERATOR         |          |    100K|   292K|   638   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS STORAGE FULL| TLARGE   |    100K|   292K|   638   (0)| 00:00:01 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------------------

Example PUSHDOWN

select /*+PARALLEL(4) GBY_PUSHDOWN  */ mod5_id, count(*) from tlarge group by mod5_id;
---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name     | E-Rows |E-Bytes| Cost (%CPU)| E-Time   |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |          |      5 |    15 |   640   (1)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR                  |          |        |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)            | :TQ10001 |      5 |    15 |   640   (1)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
|   3 |    HASH GROUP BY                 |          |      5 |    15 |   640   (1)| 00:00:01 |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE                   |          |      5 |    15 |   640   (1)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      PX SEND HASH                | :TQ10000 |      5 |    15 |   640   (1)| 00:00:01 |  Q1,00 | P->P | HASH       |
|   6 |       HASH GROUP BY              |          |      5 |    15 |   640   (1)| 00:00:01 |  Q1,00 | PCWP |            |
|   7 |        PX BLOCK ITERATOR         |          |    100K|   292K|   638   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|   8 |         TABLE ACCESS STORAGE FULL| TLARGE   |    100K|   292K|   638   (0)| 00:00:01 |  Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------------------------------------------

See the extra step #6 HASH GROUP BY. Each PQ reader will read a set of BLOCKs and aggregate the data in its local memory space by mdo5_id. As a reminder, there are only 5 values for mod5_id, so the memory for aggregation will be very small. When there is no more work assigned for the PQ reader process, each PQ reader will distribute its aggregated data based on the hash key for final aggregation.

This is best illustrated with the communication report for each query (V$PQ_TQSTAT report). Note that this can only be extracted from the session that ran the SQL.

v$PQ_TQSTAT query:
select dfo_number "d", tq_id as "t", server_type, num_rows,rpad('x',round(num_rows*10/nullif(max(num_rows) over (partition by dfo_number, tq_id, server_type),0)),'x') as "pr", round(bytes/1024/1024) mb,  process, instance i,round(ratio_to_report (num_rows) over (partition by dfo_number, tq_id, server_type)*100) as "%", open_time, avg_latency, waits, timeouts,round(bytes/nullif(num_rows,0)) as "b/r"
from v$pq_tqstat order by dfo_number, tq_id, server_type desc, process;
select /*+PARALLEL(2) NO_GBY_PUSHDOWN  */ mod5_id, count(*) from tlarge t group by mod5_id;
d, t, SERVER_TYPE, NUM_ROWS, pr, MB, PROCESS, I, %, OPEN_TIME, AVG_LATENCY, WAITS, TIMEOUTS, b/r
1	0	Producer  	53608	xxxxxxxxxx	0	P002  	1	54	0	0	37	18	4
1	0	Producer  	46392	xxxxxxxxx	0	P003  	1	46	0	0	36	18	4
1	0	Consumer  	60000	xxxxxxxxxx	0	P000  	1	60	0	0	188	185	4
1	0	Consumer  	40000	xxxxxxx	0	P001  	1	40	0	0	188	185	4
1	1	Producer  	3	xxxxxxxxxx	0	P000  	1	60	0	0	24	12	16
1	1	Producer  	2	xxxxxxx	0	P001  	1	40	0	0	16	8	20
1	1	Consumer  	5	xxxxxxxxxx	0	QC    	1	100	0	0	3	0	17
select /*+PARALLEL(2) GBY_PUSHDOWN  */ mod5_id, count(*) from tlarge t group by mod5_id;
d, t, SERVER_TYPE, NUM_ROWS, pr, MB, PROCESS, I, %, OPEN_TIME, AVG_LATENCY, WAITS, TIMEOUTS, b/r
1	0	Producer  	5	xxxxxxxxxx	0	P002  	1	50	0	0	0	0	25
1	0	Producer  	5	xxxxxxxxxx	0	P003  	1	50	0	0	0	0	25
1	0	Consumer  	6	xxxxxxxxxx	0	P000  	1	60	0	0	84	81	24
1	0	Consumer  	4	xxxxxxx	0	P001  	1	40	0	0	84	81	28
1	1	Producer  	3	xxxxxxxxxx	0	P000  	1	60	0	0	2	1	16
1	1	Producer  	2	xxxxxxx	0	P001  	1	40	0	0	2	1	20
1	1	Consumer  	5	xxxxxxxxxx	0	QC    	1	100	0	0	3	0	17

Notice how for the query WITHOUT the PUSHDOWN each of the producers sent 53,608 and 46,392 rows to each consumer. Given that I have five values and two consumers, one of the consumers had 20,000 extra values, so the consumers took 60,000 and 40,000 rows each.

While in the case WITH the PUSHDOWN, the producers only produced five rows, significantly reducing the amount of data exchanged between the Parallel Query workers.

The drawback of this optimization is if the number of unique values in the groups identified by the columns in the “group by” clause (i.e. each group has very few members), then this optimization will do more work. But in most cases, the savings are quite significant.

NOTE: This is a cost-based optimization, so it’s very possible for the Optimiser to choose NOT to use it – when in fact it should.

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

An Oracle ACE with a deep understanding of databases, application memory, and input/output interactions, Christo is an expert at optimizing the performance of the most complex infrastructures. Methodical and efficiency-oriented, he equates the role of an ATCG Principal Consultant in many ways to that of a data analyst: both require a rigorous sifting-through of information to identify solutions to often large and complex problems. A dynamic speaker, Christo has delivered presentations at the IOUG, the UKOUG, the Rocky Mountain Oracle Users Group, Oracle Open World, and other industry conferences.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *