Back
Home
Up
Next |
|
How to avoid the infinite recursive insertion.
This document discusses a workaround for a problem that some people have discovered and that it doesn't affect other engines. Among SQL statements, there's a provision to insert rows from one table into another. One syntax some engines accept is
select <fields> from tbl_source into tbl_target
but it's not supported by IB, that offers in exchange the more limited construction
insert into tlb_target(<fields>) select <fields> from tbl_source
where the VALUES keyword cannot be used, but a SELECT statement goes after the name of the target table and the optional list of fields to populate.
I wrote "limited construction" because the SELECT that feeds the table cannot have an ORDER BY clause, for some specialized needs I've read about, so in this case, the solution is to write a stored procedure. The feeding SELECT can't have UNION instructions, too. You're allowed to invoke gen_id(), in case you require a value produced by a generator.
The need to feed a table from itself arises sometimes and here's the case where people suffered from bad experiences: the self-insert construction, depicted as
insert into tbl(<fields>) select <fields> from tbl
or even the simplest but error prone syntax
insert into tbl select * from tbl
causes an infinite recursion. When IB does the select, it doesn't get the full rowset into a buffer, since it's working in the server's space itself (data doesn't travel to the client) and there's no need to sort data. Hence, it starts reading sequentially the records from the table, but as it reads, it's pushing new records into the same table thanks to the INSERT command that receives the results of the SELECT command. It's evident that with such schema, IB will never hit the end of the table, since at the time it goes to the last record that existed before the command was executed, at least one new record has been inserted at the tail and IB picks it, inserts it and goes for the next record and so on, until the partition fills and IB displays an error or you kill IB. Probably you will have to validate your db after this hard stop to ensure there's no corruption. Of course, if your table is empty or if the WHERE/GROUP BY clauses of the SELECT don't bring any record, you're saved but only because the command didn't add any row.
I've seen some naive attempts to write a procedure to save the situation. Let me say that a code like this:
set term ^;
create procedure proc_ins
as
declare variable temp <data type>;
begin
for select field from tbl into :temp do
insert into tbl(field) values(:temp);
end ^
set term ;^
is not guaranteed to work, because the procedure doesn't do caching by itself: it takes the records one at a time and inserts them, so we have the same problem as with the special INSERT syntax shown at the beginning. I've seen some weird caching effects in procedures in IB5 with two nested SELECT statements, for example, but I wouldn't rely on them as a feature; I consider them a bug until a clear and very detailed of the mechanics of procedures is available.
But the observation made above is the workaround: when records are read sequentially and pumped into the same table, there's no buffer because there's no need to sort data. Then, one has to force a sort of data, so the result set is created before the insertion takes place. It seems fairly easy, but we have a problem that was mentioned before:ORDER BY is not allowed in the "insert from select" syntax, so if you try to do
insert into tbl(<fields>) select <fields> from tbl order by <order_fields>
the sentence will be rejected. We are left with two workable workarounds: writing a procedure, so we can modify the former proc to include an ordering clause or forcing an implicit sort to satisfy the GROUP BY clause.
Now, I will present a very trivial example, but it can be applied to more complex scenarios with the conditions I will define after it:
create table rec_ins(a int);
insert into rec_ins values(0);
insert into rec_ins values(1);
insert into rec_ins values(2);
insert into rec_ins values(3);
commit;
You could attempt
insert into rec_ins
select a from rec_ins
but don't do it, because you will have the infinite recursion! You have been warned, this is the problem we need to overcome. If you try to see the query plan that IB is using, it's not available. So, if you want to be sure a sorting will occur, please prepare the SELECT part of the command and observe the PLAN. The plain
select a from rec_ins
produces PLAN (REC_INS NATURAL) so it's not what we need. But
select a from rec_ins order by a
produces PLAN SORT ((REC_INS NATURAL)) that's what we need: a sort as the last step (outer step) in the query. Since we can't use ORDER BY, let's try grouping:
select a from rec_ins group by a
and it produces PLAN SORT ((REC_INS NATURAL)) that's good again, so you can run the command
insert into rec_ins select a from rec_ins group by a
with the confidence that it will stop when all the original records have been processed. You should remember, however, the GROUP BY eliminates duplicate values in the fields passed to it, so be careful. Also, in this case, if field "a" has several NULLs, the grouping only produces one row as a result: it considers all non-values as identical.
Warning: if IB uses an index to answer the GROUP BY, then there's no sorting and you need to refine your sentence. For example, if the table shown above would have an index, IB could answer something like this:
PLAN (REC_INS ORDER RDB$PRIMARY22)
that means the index (in this example, named rdb$primary22) is being used the do the grouping. It doesn't serve our purposes, we need a sort. Let's use another example:
create table rec_ins_idx(
a int not null,
b int not null,
unique (a, b));
commit;
insert into rec_ins_idx values(0,0);
insert into rec_ins_idx values(1,1);
insert into rec_ins_idx values(2,2);
insert into rec_ins_idx values(3,3);
commit;
Now that we have the values in place, we know that grouping in a way the index can be used doesn't help us:
select * from rec_ins_idx
group by a, b
because the generated plan is the following:
PLAN (REC_INS_IDX ORDER RDB$24)
so the index walk solves the problem instead of a sorting.
But as the index only can be used if the sort is grouping is requested on the fields that make up the index in the same left-to-right order they appear in the index definition, simply reversing the order produces a real sort instead of an index scan:
select * from rec_ins_idx
group by b, a
with a desired query plan
PLAN SORT ((REC_INS_IDX NATURAL))
Now, we can write our full sentence:
insert into rec_ins_idx(a, b)
select a, b from rec_ins_idx
group by b, a
In this case, there's no danger that the grouping will return less records than the table has, since the unique index was defined spanning both fields and so every row is different by definition.
Warning: do not try to force the usage of a sort file by creating a manual PLAN clause that specifiea the SORT modifier in such plan, because IB5 documentation says that IB5 reports SORT information in the PLAN but the SORT modifier is ignored in a manually specified query plan. It's accepted, but it won't force a sort of the internal stream if not needed really. And if you do not have a sorting step, you will be caught by the recursive insertion.
If the GROUP BY clause becomes tricky or impossible to apply, we need to resort to an ORDER BY clause, but since we can't put it in an insert/select statement, we need to return to our stored proc based solution, specially if you expect the command to be invoked often. First, the command
select a, b from rec_ins_idx
order by a
produces as expected
PLAN (REC_INS_IDX ORDER RDB$24)
that's no solution for us. But we know that forcing a sort in a column that is not indexed produces the desired effect:
select a+0, b from rec_ins_idx
order by 1
and the plan is
PLAN SORT ((REC_INS_IDX NATURAL))
because the first field is now a+0 and it's an expression. IB doesn't go to the extreme of understanding that "a+0" and simply "a" are the same, so it doesn't use the index for sorting.
Now, the last task is to produce the procedure:
set term ^;
create procedure proc_ins_idx
returns(c int)
as
declare variable a int;
declare variable b int;
begin
for select a+0, b from rec_ins_idx order by 1
into :a, :b
do begin
insert into rec_ins_idx(a ,b) values(:a, :b);
c = c + 1;
end
suspend;
end ^
set term ;^ I've used a return parameter (output parameter) named "c" to provide easy feedback on how much records were inserted, so the procedure can be called
select c from proc_ins_idx
to know the answer. Making the procedure selectable also avoids a known bug in IB when a procedure is executed repeated times. Some people may want the effect of an insertion that at the same time outputs the values:
set term ^;
create procedure proc_ins_idx2
returns(c int, a int, b int)
as
begin
for select a+0, b from rec_ins_idx order by 1
into :a, :b
do begin
insert into rec_ins_idx(a ,b) values(:a, :b);
c = c + 1;
suspend;
end
end ^
set term ;^
Other engines don't have this recursion problem, so this can be seen as a limitation or a bug that must be overcomed until it's fixed.
©Claudio Valderrama, Jan-2001.
|