Extracting metadata from IB, overview.(Or the tale on how to overcome several quirks.) This document intends to address basic problems that developers face when extracting metadata from IB databases. The original motive is the observation that currently (at the time this is written, February 2001), no known tool is able to extract full metadata in the right order in all cases so that the generated script can be compiled in IB to create again the same metadata. So, the only solution for this problem continues being to backup only metadata and restore the db structure later for reuse. This won't help tools that want to perform a textual comparison or users that want to spot differences at a first glance. The first hurdle is that system tables never have gotten a decent explanation in LangRef. There are not only outdated parts but missing details. When you read that some fields are not used by SQL objects, probably you should interpret that SQL's DDL commands don't touch or don't care about those fields. Those and other few fields aren't explained at all. Also, at least three system tables (rdb$formats, rdb$pages and rdb$view_relations) aren't need to extract metadata and another one (rdb$security_classes) is written by the engine but not used by the SQL support mechanism, since it predates support for SQL in IB. The second problem is minor lack of uniformity. Some system tables use a NULL flag to indicate user objects and others use a zero flag to indicate them, so better you check for both. Ivan Prenosil has pointed out that even in two procedures created consecutively, the flag can be zero or NULL. Without reading about the whole history of IB, it's not evident why those inconsistencies developed over time (and even after reading the history, the doubts remain). IB supports domains but table fields aren't required to use a domain. However, in this case, IB creates an implicit domain that's not a system object and the only way to tell it's implicit is to adhere to the convention that those domains start with 'RDB$'. Also, stored procedures do not allow domains used in their parameter specification, but they use the domains table (confusingly named rdb$fields) just to keep the length and the character set. Since a table's field can override the domain's specification, you should take both into account and even more, a rogue user can alter an implicit domain or impose a new restriction on a field that depends upon an implicit domain. Our third problem is that system tables do not convey all the information needed to get a full script that takes into account all the relationships in a working database. Going further, IB itself doesn't know enough about the databases it operates on behalf of the user, so it's possible to drop objects that are required by other objects inside the same database and you'll notice the damage only when you try to access the objects that were left with dangling references. Dependency information is scattered in rdb$dependencies, rdb$ref_constraints and the BLR of triggers and procedures. Since IB doesn't implement DECIMAL according to the standard, DECIMAL and NUMERIC are the same in practice, because NUMERIC is allowed to be engine dependant and DECIMAL behaves the same. Worse, until ODS9 (IB5.X), IB didn't know what was the precision of a NUMERIC or DECIMAL field, so when it was extracted, it was converted to either smallint, integer or double precision or a NUMERIC holding the maximum precision allowed by the underlying storage, that often was greater than the original specification. With the advent of IB6, things went worse or better, depending on the standpoint. Generally speaking, IB has been able to read databases made with previous versions. However, in this case, the developers decided to make it only compatible with its own ODS10 and in exchange, they provided dialect 1 to emulate IB5 (but it's not a complete emulation), dialect 2 to be used as a debugging tool and dialect 3 to adhere to SQL rules and exact numeric fields. In dialect 1, single and double quotes are equivalent, whereas in dialect 3, identifiers use double quotes and literal strings, single quotes. You are responsible for double quotes around names (only accepted in dialect 3), but if an identifier contains embedded double quotes, you should discover them and create the escape sequence to write to the script. A database created with IB6 (ODS10) will make use of a new system field, rdb$precision, to keep the precision of NUMERIC and DECIMAL fields. Even though it's useful for more accurate metadata extraction, the difference betwen NUMERIC and DECIMAL subtype codes is acknowledged only in the documentation and not in rdb$types (beware you need a lot of left joins on system tables to be sure you won't miss fields) and also, IB won't enforce this precision in practice. (Really, IB implements only the behavior of NUMERIC, not DECIMAL according to the SQL standard.) Worse, initial release of IB6 and current releases of Firebird are able to read ODS9 thanks to IB5's code, where this rdb$precision doesn't exist, so you'll want to check the server version and the ODS. But if you upgrade your database to ODS10 by means of backup/restore, rdb$precision will be NULL for fields already created and only will have values for fields created after the upgrade. But developers face even more challenges that hide inside IB since its inception. First, the existence of COMPUTED fields allows you to define pseudo fields that aren't stored in the database but that are calculated from the provided expression when they are requested. I was the first surprised person when I learned that in the SQL standard, there's nothing equivalent to them, so they are proprietary. Usually, COMPUTED fields are based on the value of other fields in the same row, but a kind of escape sentence allows the COMPUTED BY expression to reference a full column, another table or view or even a stored procedure. Of course, these dependencies aren't tracked by IB, so if you drop a procedure that feeds such dynamic column, you will realize your mistake later1. Another old capability that's supported contrarily to what the manuals say is the possible subversion of the natural dependency of tables from domains. Since a domain supports only one CHECK CONSTRAINT, you can use the same trick than with COMPUTED fields and reference another table, view or stored procedure on the check, making such domain to become dependant on a table or stored procedure through the check. Extending the same reasoning, you can see that any CHECK CONSTRAINT imposed on any table's field can reference another object and so create a subtle dependency that makes metadata extraction a real nightmare. Also, it's interesting to note that in late 2000 the Firebird project fixed an assumption in the built-in ISQL utility about the database-wide character set when creating an script, that such assumption was probably since 1994 at least and it affects third party utilities as well. The documentation states that a VIEW cannot be based on a stored procedure and the engine seems to confirm such rule, because you can't put a procedure name in the FROM clause of a CREATE VIEW command, but you can put a procedure in the WHERE clause of the same command, so what's the truth here, is that considered a dependency or not? At least, by peeking at rdb$dependencies, one can say that IB considered the relationship, so the open question is still if such construction is legal or not. To put another point in discussion, IB currently allows you to define any kind of constant value as a default for a field or domain whether it has sense or it's compatible or not. If you want to take the trouble one step further, you should remember that the human-readable fields that contains defaults, check constraints, procedures and triggers text, views text and descriptions can be wiped out and the database will be read and used by IB normally, so in the ultimate case, you will want to parse BLR to know all the information. ©Claudio Valderrama, February 2001. Notes: |
This page was last updated on 2001-04-12 18:56:35 |