Back
Home
Up
Next

How the optimizer works.

    Thanks to Jim Starkey, here's an overview of the IB optimizer's phases. I reproduced the full email to be accurate on the context:

From: Jim Starkey <>
At 12:41 AM 3/26/00 -0400, Claudio Valderrama C. wrote:
>From: "Claudio Valderrama C." <>
>
> Speaking about optimizations and indexes, Jim, can you write a bit about
>indexes and the query optimizer, please? There has been some controversy on
>how Interbase takes into account indexes to generate optimum query plans.

Had I my druthers I duck this one; the current regime and I don't see eye to eye. I'll start with the big picture then bad mouth plans.

The Interbase optimizer has three general phases: join order selection, index join generation (streams), and joining of streams into rivers by sort/merge.

Join order selection is cost based with the optimizer make a crude guess of the number of pages necessary to get to a record. This involves guesses of the cardinality of the various tables and selectivity on indexes, none of which is known with any accuracy. In theory this is simple: consider all possible join orders and pick the cheapest. And this is what V1 did. Worked pretty well. Then somebody threw a 17 way join. It still worked pretty well -- the resultant query executed in less than 10 seconds. Unfortuantely, the optimization phase took 14 hours.

So the optimizer got a lot smarter. It learned to alway take unique indexes ahead of non-unique and non-indexed and at every stage to compute the best case for remaining subtree to compare against the current best solution. After a couple of days of reasonably hard work the 14 hour optimization came down to about 45 seconds with the same answer.

As an aside, the Interbase optimizer has much less work to do than most (excluding, of course, Rdb/Eln and Netfrastructure). Traditional relational database systems need to worry about index selection as well. Interbase is perfectly happy, yea eager, to and/or bitmaps from multiple index scans before fetching records. The more the merrier.

Once the join order has been established, the optimizer loops through the streams looking for an index or indexes to get to the next stream. If it can't find one, it makes the previous sequence of streams into a tributary. The next phase combines the tributaries into rivers by sort/merge joins, recursively, until one big fat river results. Voila! (in fact, sort/merge joins are rare, usually mother nature telling you that your database design stinks, and the existence of marketting department reminding you that, well, never mind).

The output of the optimizer is a linked tree of objects call RSBs (Record Source Blocks), each of which doesn't something simple well. Among the RSBs are exhaustive retrieval, indexed retrieval, boolean sieve, cross product, sort merge, and sort. Each RSB has three entry points, Open, Fetch, and Close, and each blindly calls other RSBs to free them a stream. In other words, the runtime requires (and has) no intelligence. The optimizer is supposed to be smart and the execution dumb and fast.

<flame>
A cost based optimizer is an attractive nuisance. Many
people think in terms of rule based optimizers (which don't scale with complexity), and can't resist the temptation to override a cost based optimizer with the odd rule there and there to get around a glitch. This usually (read "always") breaks the cost base scheme with really embarrassing pessimizations. Rather than go back and repair the architectural damage, the temptation is to build in a mechanism of "plans" to report and/or override the optimizer.

A plan, in essense, is a bug on a bug. So the failure to backup a plan required by a database procedure in gbak is a bug on a bug on a bug. Plans are definitely in the category of things database developers invent to make poor performance somebody else fault.
</flame>

Plans are now part of Interbase and therefor can't go away. I hope that the need for plans goes away in the very near future.

Jim Starkey

 

This page was last updated on 2000-07-23 01:32:49