Thursday, April 21, 2011

Referential integrity and optimizer shortcuts

I overheard again just the other day that database performance can be improved by eliminating foreign keys and doing integrity constraint checking in the application. This fallacy is based on the notion that there must be some database overhead for enforcing foreign key constraints, so the database will always be faster if it can avoid this overhead. In fact, Oracle's foreign key implementation is very efficient, and in real world cases, the overhead of foreign keys is negligible, and can sometimes even provide slightly faster DML (see, for example, Tom Kyte).

Moreover, referential integrity can provide huge performance gains in situations where Oracle's optimizer can use it to devise better execution plans. I stumbled upon an example of this recently in a production system and thought I'd share it.

The query below involves five tables related by foreign keys. The joins follow these foreign keys.

select count(*)
 from goats g,
      properties p,
      doelings d,
      goat_versions v,
      farm_animals f
where g.type_id = :1
  and g.property_id = p.property_id
  and g.goat_id = d.doeling_id
  and d.status = 'alive'
  and g.goat_id = v.alive_id
  and f.animal_id = g.goat_id
;

The execution plan with predicates is shown below. In this farm animal database, index names start with their table name prefix. Please do not spend too much time studying this plan; I'll walk you through the important parts below.

-----------------------------------------------------------------------
| ID  | OPERATION                       | NAME                        |
-----------------------------------------------------------------------
|   1 |  SORT AGGREGATE                 |                             |
|   2 |   NESTED LOOPS                  |                             |
|   3 |    NESTED LOOPS                 |                             |
|   4 |     NESTED LOOPS                |                             |
|*  5 |      TABLE ACCESS BY INDEX ROWID| GOATS                       |
|*  6 |       INDEX RANGE SCAN          | GOATS_TYPE_IDX              |
|*  7 |      INDEX UNIQUE SCAN          | GOAT_VERSIONS_ALIVE_ID_UNIQ |
|*  8 |     INDEX UNIQUE SCAN           | FARM_ANIMALS_PK             |
|*  9 |    TABLE ACCESS BY INDEX ROWID  | DOELINGS                    |
|* 10 |     INDEX UNIQUE SCAN           | DOELINGS_DOELING_ID_PK      |
-----------------------------------------------------------------------

PREDICATE INFORMATION (IDENTIFIED BY OPERATION ID):
---------------------------------------------------

   5 - FILTER("G"."PROPERTY_ID" IS NOT NULL)
   6 - ACCESS("G"."TYPE_ID"=:1)
   7 - ACCESS("G"."GOAT_ID"="V"."ALIVE_ID")
       FILTER("V"."ALIVE_ID" IS NOT NULL)
   8 - ACCESS("G"."GOAT_ID"="F"."ANIMAL_ID")
   9 - filter("D"."STATUS"='alive')
  10 - ACCESS("G"."GOAT_ID"="D"."DOELING_ID")

First, notice how the PROPERTIES table (P) is not accessed, nor or any of its indexes! Second, notice how the predicate for operation #5 (g.property_id is not null) is not in the original SQL above; the Oracle optimizer added it on its own!

What is going on?

The Oracle optimizer knows that the PROPERTY_ID column in child table GOATS (G) is a foreign key to parent table PROPERTIES (P). The query SQL text includes the join condition "G.PROPERTY_ID = P.PROPERTY_ID." But due to referential integrity, all values of G.PROPERTY_ID must also exist in P.PROPERTY_ID. Therefore, if G.PROPERTY_ID is not null, then this join condition must be satisfied.

The Oracle optimizer has used this knowledge to eliminate the need to access parent table PROPERTIES (P) or any of its indexes. This provides a significant cost savings. This shortcut would not have been available if referential integrity had been enforced by the application.

No comments:

Post a Comment