The mystery of rdb$db

Back
Home
Up
Next

The mystery of rdb$db_key III.

One of the things that most confuses people not used to db_key is the fact it needs almost all the time to be qualified explicitly with the table name. For example,

select rdb$db_key from table
will work without problems, but

select rdb$db_key, field from table
won't work even if it's your birthday. In this case, you need explicit qualification:

select table.rdb$db_key, field from table

When the name of the table is long or convoluted or there are more than one table, table aliases come in handy to make sentences more clear as it was shown in the previous page of this topic. Table aliases are treated as a synonym of the table you're using, so you can put them wherever you'll put the original table's name. In this case "this is my table" is the real name whereas t is a synonym. The real name has to be surrounded by double quotes because it has spaces. This name is only possible in IB6 using dialect 3:

select t.rdb$db_key, t.pkey, t.oxygen from "this is my table" t

and the sample applies when you need db_key plus all other fields, with a little additional trick:

select t.rdb$db_key, t.* from "this is my table" t

Here, if you intend to put only asterisk, you'll get a syntax error. Beware that the use of asterisk is discouraged, because the order of fields may change as the table is restructured or new and unexpected fields may appear that would render a FOR/DO/INTO construction useless in a stored procedure (of course, the engine has some protection against changing objects that have dependant objects as it's the case of a table with a procedure using it, but don't count on this protection as being 100% foolproof).

You should note that, as it was written in the first page, db_keys are per table and use 8 bytes (except in views, as it was written, too), so if you want to use that property on all tables of a join, you must read each db_key; for example:

for select c.id, c.rdb$db_key, s.id, s.rdb$db_key from cats c join squirrels s on c.owner = s.owner into :cat_dbk, :squi_dbk do begin update cats set partner_squirrel = s.id where cats.rdb$db_key = cat_dbk; update squirrels set partner_cat = c.id where squirrels.rdb$db_key = squi_dbk; end

Note the example is very contrived and nonsense: there's no guarantee a person has to have only one squirrel and only one cat at a maximum, so the link has to be made really through a third table to support the N<->N relationship, namely, a NUB. (BTW, do you need to link cats and squirrels with the same owner?) However, using this construction you can improve the performance of operations that need to update two or more tables at the same time and usually with interdependent information. Had the procedure being written in a typical DSQL way, it would have required two separate SQL update statements with an inner select and furthermore, you cannot take easy control of a query plan that's the inner part of an insert, delete or update statement. Using a procedure instead, it can be written in a traditional form, using the retrieved primary keys (c.id & s.id) in the WHERE clause, but would be slower as the engine would have performed a search by PK or a natural scan, whereas with db_keys as shown above, the procedure runs as fast as allowed by the engine, because it locates each record by its position that was obtained in the select part.

I have stated in the prior paragraph that's when a query is nested in another SQL statement, you cannot take easy control of the nested query's plan. One reason is IB generates two separate plans for these sentences (one for the outer sentence and other for the inner sentence) and the syntax to force an explicit SQL plan doesn't cater for more than one plan. To be accurate, you can if you put each plan after the sentence where it's used, but probably the example doesn't look very attractive, using the example employee database that comes with IB:

select emp_no, dept_no, (select department from department d where d.dept_no = e.dept_no PLAN (DEPARTMENT INDEX (RDB$PRIMARY5)) ) as dept_name from employee e PLAN (E NATURAL)

Furthermore, using a plan that a human being estimates to be faster can be simply rejected by the IB optimizer because it has some "prejudices" about bad and good plans. As the outer an inner sentences become more complex, formulating by hand a better plan and having the optimizer to agree on it starts to be a really tricky exercise not recommended for the faint of heart.

Until now, examples of insertions and updates have been shown. Now, an example of a deletion is presented. The idea is very simple:

delete from employee where emp_no not in (select emp_no from employee_project) and job_code in ('Eng', 'Mngr')

So we are mimicking a company that wants to lower costs by dropping employees with no current project assignments. IB will answer with these plans in the example database employee.gdb:
PLAN (EMPLOYEE_PROJECT INDEX (RDB$PRIMARY14))
PLAN (EMPLOYEE INDEX (RDB$FOREIGN9,RDB$FOREIGN9))

However, the task is not completed: we need to delete the salary history or these employees won't be deleted by IB to maintain referential integrity. We would try to assign these data to an special employee, but in this case we would wipe out the history, too and this must be done before the other statement:

delete from salary_history where emp_no in (select emp_no from employee where emp_no not in (select emp_no from employee_project) and job_code in ('Eng', 'Mngr'))

so we are converting the DELETE previously outlined now as a SELECT and using them to find what salary_history entries to delete. Why the nesting? Because job_code is an attribute of employee and not from employee_project, so if there's a shortcut, it's not obvious. IB will answer with this plan:
PLAN (EMPLOYEE_PROJECT INDEX (RDB$PRIMARY14))
PLAN (EMPLOYEE INDEX (RDB$PRIMARY7,RDB$FOREIGN9,RDB$FOREIGN9))
PLAN (SALARY_HISTORY NATURAL)

Doesn't look a very simple SQL plan, right? Someone should try to optimize these two statements or deleting idle employees will cause all employees going idle while the server eats resources trying to delete all needed entries. The main part is the same in both sentences, but one uses DELETE and the other uses SELECT. We can make them only one in a stored procedure:

for select emp_no from employee where emp_no not in (select emp_no from employee_project) and job_code in ('Eng', 'Mngr') into :emp_no do begin delete from salary_history where emp_no = :emp_no; delete from employee where emp_no = :emp_no; end;

Probably runs faster, but it's not optimized yet. First. the NOT IN test is slow. Who said it cannot be converted to a join? Let's see:

select e.emp_no from employee e left join employee_project ep on e.emp_no = ep.emp_no where e.job_code in ('Eng', 'Mngr') and ep.emp_no is NULL

This part looks promising because IB is using all available indexes:
PLAN JOIN (E INDEX (RDB$FOREIGN9,RDB$FOREIGN9),EP INDEX (RDB$PRIMARY14))

If you change
where e.job code in ('Eng', 'Mngr')
to be instead
where e.job_code = 'Eng' or e.job_code = 'Mngr'
the plan changes to become
PLAN JOIN (E NATURAL,EP INDEX (RDB$PRIMARY14))
and this means if you prefer the OR to the IN construction, IB does a natural scan (sequential scan) of the employees, looking for the ones that are either in engineering or management. The use of the index thanks to the IN clause can be a benefit when people in engineering and management are a few compared with the total amount of employees; otherwise, the natural scan seems to be better.

Note there's a catch: actually the test for 'Eng' and 'Mngr' became part of the WHERE and not a second condition AND'ed with the JOIN condition. If you use it as a second condition in the JOIN, you get a completely different SQL plan and your results aren't the expected at first glance: the condition won't filter anything... you could be dropping busy employees in a real case. This is not a bug in Interbase but the way outer joins work! The moral is: watch your plans and your results when doing these conversions if there is a left or right join playing in the game.

As the procedure is defined, it moved the problem inside the loop: now we are deleting each employee, seeking it by primary key, so let's apply the last step:

create procedure del_idle_employees as declare variable dbk char(8); begin for select e.rdb$db_key, e.emp_no from employee e left join employee_project ep on e.emp_no = ep.emp_no where e.job_code in ('Eng', 'Mngr') and ep.emp_no is NULL into :dbk, :emp_no do begin delete from salary_history where emp_no = :emp_no; delete from employee where rdb$db_key = :dbk; end; end

If you still are wondering what has been improved after these successive steps, then let's check the most important achievements:

Initially, we needed two loops and the one to clean salary_history was not simpler than the code put in the final procedure.

Nested queries are expensive in Interbase. They are much better handled as joins. Even a NOT IN clause can be faster if rewritten as a LEFT JOIN as shown above. There are exceptions, of course.

Salary_history must use the primary key of employee to locate records because there can be several records per employee as his/her salary has changed. Employee can take advantage of rdb$db_key to delete the record found in the join no index overhead.

In case it's not enough clear, rdb$db_key cannot be used across tables: even if you do a join of two tables using indexed fields, you cannot compare rdb$db_key. In the example, above, there's a join between employee and employee_project. Even if it was an inner join (the default join type) instead of an outer join (left, right or full join), when e.emp_no=ep.emp_no is met, e.rdb$db_key<>ep.rdb$db_key will be true: rdb$db_key is a raw record position and therefore, two tables cannot use the same physical space in the database.