The mystery of rdb$db_key II.Probably the first question that comes to mind are why a person would want to use db_keys and when and how long they are valid in the database. In a general answer, people come to low levels when they can't do what they want in higher levels of abstractions or when they need better performance than in standard SQL. Now, on IB specifically:
The duration of db_keys is a point that's not very clear in the manuals unless you read with sharp eye the API Reference Guide. By default, a db_key is valid only for the duration of the current transaction. After you use Commit or Rollback, the db_key values you had are inaccurate. If you are using auto-commit, you still can use db_keys because this mode uses an implicit CommitRetaining, that means the transaction context is retained, so garbage collection cannot proceed and hence, the db_keys used remain valid until you do a "hard" Commit. At risk of being repetitive, the explanation is when a txn is committed, garbage collection may proceed so "old record versions" can be collected and "compacted", namely, marked as free to use. Also, another txn might have deleted a record you were using but your txn isolation level prevented you from being aware of this change. Your db_key grabbed when the txn was active can point now to a record version that doesn't exist anymore. If you took much time to do your update to the record identified with the db_key, you'll want to check that the record has not being changed by another transaction in the meantime. IB helps tracking some of these conflicts, see writing behavior among conflicting transactions. You can change the default duration of db_key values at connection time by using the API or IBO. You can specify that the values must be kept along the entire session; this means across the time elapsed between the time you connect to the db and the time you disconnect. However, keep in mind that this means garbage collection is stuck for all this time. Internally, IB keeps a transaction open that stops garbage collection. In highly interactive environments, this may result in your database file growing at a big rate and the operations in the database being each time slower. As a corollary, having an auto-commit transaction working for all the day or using only CommitRetaining or simply opening a transaction that's not committed for several hours without need is not a wise solution. Now, we will analyze briefly the four points on using transactions that were shown above as possible advantages or workarounds: Point a) A db_key is faster than a PK. One level less of indirection. However, if the database pages needed are in main memory and the subset requested exhibits locality (all the records are near to the others) and it's small, the difference in access speed shouldn't be noticeable. Point b) Sometimes, indexes are turned off or not defined at all for efficiency reasons. For example, maybe a table used as an audit trail won't be indexed so it imposes a few penalties for inserting actions and data related to the operations on other tables. Should you need to make interactive alterations to the data, your only point of identification for a record is its raw position, IE, the db_key. When there's no key, IBO automatically uses the db_key. This method works well with updates and deletions. However, newly inserted records cannot be shown: how do you know what's the value of the db_key assigned to the new record? You only can refresh your query and fetch again the records. Point c) The IB optimizer stills has trouble with some sentences. If you try to run something like
against a huge table, you are likely to find performance problems. If you run the same operation often, then it's worth the time to write a stored procedure doing
Although speedier, it's still has the problem that records in A have to be located by PK each time a new pass of the for/do loop happens. Some people claim better results with this alien syntax:
The last form seems to have some advantages:
Usually, it's not possible to use an explicit query plan in insertions, updates or deletes, so the only way to proceed seems to be using a procedure in case you get slow performance. You may wonder why I include insertions. Well, an insert statement is not contrived to a single sequence of values that's sent by the client application. In fact, it can be very complicated as "insert into this table the summary, average and count of distinct values from the join of these two tables while condition C.e=somevalue is met but only if the item inserted is not already in this table, otherwise perform an update". There's an special syntax for inserting the results of a select statement:
Of course, the update part has to be left for another sentence. You can change
It might be the case that B.a and B.b are allowed to be NULL and in this case, this procedure would be the only native way of dealing with that condition before the insert or update. This is the purpose of the two lines between square brackets to mean they must be enabled if NULLs can occur. Remember that the accepted comment delimiters in Interbase procedures and triggers are the same than in language C, so if you really want to left these lines in the code but without effect, you must enclose them between /* and */ to be bypassed. Conversely, count() never can be NULL, because it only counts records. This is a rule you'll want to remember:
If you don't mind writing convoluted sentences, you can count all NULL as one occurrence more in a "distinct" clause with a weird syntax like this: or in a more standard way: So, I hope you understood that in the worst case, the result of count is zero, but it can't be NULL nor negative.
|
This page was last updated on 2000-09-19 03:18:43 |