The Rule Challenge

I’ve made this challenge on IRC several times, and I think on the mailing lists too, but I’m putting it up here for reference.

I’m not a fan of rules. I’ve used them in production code in the past, though I’d avoid doing so again, and (unlike most people who end up using them) have spent a fair bit of time analyzing their behaviour and effects. I’ve come to the following conclusion:

All non-trivial rules are incorrect.

I define “non-trivial” as follows:

  1. Any rule with multiple action statements is non-trivial.
  2. Any rule using DO ALSO is non-trivial.
  3. Any rule with a WHERE clause is non-trivial.

This pretty much only leaves rules which rewrite into at most one simple statement, which I’ll concede can sometimes be written correctly (but often are not).

I define “incorrect” as “produces surprising or destructive behaviour in response to ordinary SQL statements”. In general this requires SQL statements not forseen by the rule writer; it’s easy to write rules that work only for certain statements, but doing this in real code amounts to laying a minefield for your future developers.

My challenge is simple: show me a counterexample to the above claims. Any takers?

[EDIT: to clarify, by “rule with a WHERE clause” I mean a conditional rule with WHERE used in the rule definition, not merely a rule whose action statement has a WHERE clause in it.]

This entry was posted in Peeves, PlanetPG, Postgres. Bookmark the permalink.

15 Responses to The Rule Challenge

  1. Tom Lane says:

    Rules work all right for views (ie, ON SELECT DO INSTEAD SELECT cases). There’s no question that every other case sucks. The *real* question is how to do better?

  2. Andrew says:

    That is indeed the real question, but it’s one for another blog post rather than this one :-)

  3. Robert Young says:

    I’m not principally a PostgreSQL folk, DB2 is my main squeeze, but I do follow it a bit. The whole rules thing is something only PG did, and that was Stonebraker’s doing. One would have to read up his rationale way back when. From what I can find, View support is the justification for using rules, since that is the only way to get it. There have been a few links from the PostgreSQL site to rule posts such as this recently. I wonder why rules have become a topic du jour?

    The challenge as written isn’t provable; the counterexample is exhaustive enumeration of all non-trivial rules.

    As to better, one of those linked posts asserts that triggers are better. I would agree with that.

  4. Andrew says:

    We’ve had more than the usual number of rules questions on IRC recently, at least that’s my impression; that would probably explain the multiple recent blog posts. (And I only wrote this one because someone on IRC bugged me about it.)

  5. Steffen says:

    A few years back, I faced a very sophisticated, rapidly evolving schema. The domain was comprehensive IT management (with some big-data-like, telemetry sampling/management thrown in). This schema was more sophisticated that >99% of the schema I’ve ever seen (or ever want to see again ;-).

    * Multi-level security (akin to Oracle’s OLS)
    * Bi-temporal audit trail
    * Overlapping/intersection-based object relational mapping (ORM)
    * Schema federation (and multi-schema facades)
    * Schema evolution

    The only database we could find with nearly enough raw expressive power was PostgreSQL. PostgreSQL’s SQL-rewriting RULES were a game changer. From our perspective, this sort of ‘active schema’ facility opened a door to some really nifty things. As I see it, PG’s SQL Rewriting RULES put PostgreSQL way ahead of any alternative database (relational or otherwise). Let me try to be a *little* less hand-wavey.

    From memory, we first overlapped persistent objects. What we did was to have the shared members/attributes of an object go into a ‘base’ table. Then, there was zero or one matching records, in one or more other tables, that carryied the specialized members/attributes added by sub-classes of that base class/type. We supported only single inheritance this way – but at the expense of yet more complexity – multiple inheritance could be supported.

    At any rate, SQL-rewriting RULES did allow us to decouple the base classes, and the tables used to persist instances of such base classes, from arbitrary, uncoordinated specialization.

    Beyond this basic ‘schema’, an arbitrary level of ‘further’ schema specialization was relegated to distinct development teams (operating in different time zones, languages, cultures, etc.). The idea was to minimize the need for coordination/communication about schema specializations. Different teams took the core/base/canonical schema du jour – and they all ran with it in parallel.

    Then, about once a year, we would promote (or absorb) any duplicated-via-specialization members/attributes/triggers/procedures into the core/base/shared schema. Actually, we ended up with a hierarchy of such layered-by-specialization schemata.

    I recall having to work around the lack of upsert (aka merge). When an instance of specialized object was created by the application layer logic, we often needed to either update the existing, and corresponding, base class record or insert a new one. As I recall, we used a stored procedure. Long story.

    Coming from a heavy Java/Hibernate (ORM tool) background, we recognized that we were implementing a novel form of object persistence (well – novel at least in so far as Hibernate team/doc surveys the various options for mapping classes onto relations/tables).

    Via this process, we freed large development teams to specialize a base schema … while keeping such specializations effectively ‘private’. There was no possible coupling to other, similar, specialized schema – unless and until the core/base schema was changed. That sort of shared schema/core/base change was carefully managed – and it happened only about once per year.

    At the end of the day, this process made parallel schema evolution practical. For a very large scale, distributed software engineering organization, this mattered. Sure, there aren’t very many software engineering operating at this global 500 scale – nor do many of those attempt a shared, core/base schema of this complexity – but perhaps that is because not too many folks recognize how much power SQL-rewriting RULES puts into the hands of schema designers. It seems to me that there may be something of a catch-22 here.

    Back then, once we recognized how much of a game-changer PG’s SQL-rewriting RULES could be, and how they affected what we (internally) dubbed an ‘active’ schema, we started to see quite a few other applications (for ‘active’ schema). For example, at one point, we needed a super strong, generalized audit trail. In IT management, keeping track of just who changed just what and when is critical.

    To use Richard Snodgrass’s (temporal SQL) lingo, we had already partitioned valid/effective dated ‘history’ records into distinct tables. For fairly obvious performance reasons, we needed to physically separate (old valid/effective dated) history (records) from current data (records). And given the nasty, inherently complex nature of this (comprehensive IT management) domain, we also required a fundamentally distributed system – with resilience to network partitions, eventual consistency and all that nasty stuff.

    So, we found RULES worked once again like a swiss army (schema) chainsaw. Given how primitive (or non-existant) the development tooling around PG RULES, none of this was easy. However, we managed to effect a second dimension of partitioning along the transaction/knowledge dimension. The result was bi-temporal audit trail, partitioned by valid/effective date ranges/periods and then further/recursively/sub-partitioned by transaction/knowledge date ranges/periods.

    We also used RULES to create a ‘matching’ (active) schema to simplify both interoperability-oriented, ETL-style mappings/interfaces. After all, if we could coax PG into presenting a schema (consissting of dozens/hundreds of write-able views) that looked like some source (or industry standard) schema, but mapped in non-trivial ways, to our canonical/base/core destination schema, everything was 1000x easier. For instance, there are some intensely complex, putative industry-standard, IT management schemata out there for the IT management domain. Think about the common information model from the distributed management task force or the SID model from the Telecommunications Management Forum or …

    One thing for sure. Without SQL-rewriting of a single SQL statement into many SQL statements, this kind of thing just cannot be done … except for cases where the source and destination schemata are largely (and significantly) conformal to begin with.

    Best of all, we could create more than one such ‘in-the-RDBMS’, active-mapping schemata … to simultaneously appeal to an ever wider variety of end-user preferences, diverse industry standards, etc. And on top of that, we found the same trick handy for producing simplified, ad-hoc reporting schemata. This made it far more practical for our intensely complex core/base/canonical schema to be exposed via simplified schemata. We found simplified schemata useful for both sub-domain use and for perspective-specific, schema-level ‘views’ (of the underlying, core/base/canonical schemata). For end-users demanding ‘simplicity’ – with utter, wanton disregard for the inherent, underlying complexity of the IT management domain – we found PG’s SQL-rewriting RULES to be a god send.

    While PostgreSQL seems uniquely distinguished by it’s avant-garde support for SQL-Rewriting RULES, and while I feel RULES makes PostgreSQL schemata 100x more expressive, there’s no question that the door to applications of such ‘active’ schema technologies is just barely cracked open – and only a little. Today, these sorts of needs are either avoided – or when pressed – just sucked up into application layer logic.

    SQL-Rewriting takes a database into a feature/function level, and into new value-add, that most folks just do not consider the province of databases at all. There is almost a reflexive reaction to put the database back into the traditional database ‘box’ (of handling persistence, ACID transactions, indexing, caching, RBAC, etc.). Generalizing into the level of SQL-rewriting is just too much encroachment into the bailiwick of analytical/BI tools, custom reporting, ETL/warehousing tools, etc.

    Going so far beyond the ‘database’ box is most disorienting … and easy to misapprehend … and actually … easy to miss altogether.

    It is almost as if some ‘active’ (SQL-rewriting) schema layer (or computationally rich schema-level view layer) might sit somewhere below the application layer and the above the ‘simpler’ (and necessary) underlying (mostly persistence-oriented) database layer.

    When SQL-rewriting is combined with computationally complete stored procedures, triggers and such … a whole host of things become both possible and practical.

    The litany of interoperability-oriented, federation-enabling, etc. opportunities are fascinating. But one thing for sure. None of these applications is very widely appreciated. None seem to be well documented. None seem to have exemplary ‘reference’ implementations. It is as if the other shoe is still waiting to be dropped.

    To the degree that this might be true, it would remind me of object-oriented programming notions back in the days when they were first implemented -in Simula68 (as in 1968). It wasn’t until the late 1980’s (or even early 1990s) that object-oriented technology broke into the mainstream. I suspect SQL-Rewriting RULES are in something of a similar ‘recognition limbo’.

  6. elein says:

    The original rules were meant to facilitate updateable views. When you have an updateable view where the updates do not match the automagic of our currect updateable views, you need to write a rule. For example, an update RULE on a view which includes a key can prevent key updates. With a view with many joined tables may want to enable updates on a few columns in a few tables. The idea is that you may not want a proscribed updateable view. A master-detail view can allow changes, for example, on only the detail tables.

    The other reason for rules are all the reasons for a view, which is a RULE.
    –ex-illustra

    • Andrew says:

      Views are quite limited forms of RULEs – in particular, they don’t allow more than one rule, or more than one action, or any WHERE clause in the way that rules do (the WHERE clause of a view itself is a different thing). So they can never qualify as “non-trivial” in the sense I used above.

      But while rules can be used to make views writable, the fact is that in all non-trivial cases the result is in some respect flawed: some queries that cause the rule to be invoked will produce unexpected and usually undesired results which are not easily predicted in advance.

      Now that we have INSTEAD triggers on views, any case where you want a writable view that’s not covered by the default magic, you should be writing a trigger rather than a rule, since it’s very easy to understand what the trigger will do and how to write it correctly.

  7. chAlx says:

    Agree with elein: RULE is great tool, despite someone can use it wrong. Meaning UPDATE rules most of all (rather trivial in terms of the article).

    Triggers are cool too, but far from “easy to understand” what will happen while looking on the “outer” sql only. Btw most non-trivial triggers are bad in the same manner as rules :)

    There is a problem with this small blog post: people are referencing it as a holy doc. And Tom Lane’s comment is even more confusing. Maybe the positive example could clarify the subject.

    • Andrew says:

      You didn’t actually answer the challenge: can you show an example of a non-trivial rule that is, in your opinion, correct?

      • chAlx says:

        That wasn’t an argument. My view rules are trivial: update some fields of the table instead of updating everything in a view (maybe with joins). Usually it’s a single row but “update view set updated_at=now()” is ok too.

        It’s just a lot of misunderstanding with this topic, especially after “every other case sucks”. One simple conclusion was not disclosed: Many trivial rules are correct.

        • Andrew says:

          An update rule on a view that has joins is probably not trivial by the definition given in the post – if you have an example of one, I’ll show you how it fails.

          Given how long we’ve had INSTEAD triggers on views, there really isn’t any excuse not to use them.

          • chAlx says:

            Well, here it is:

            CREATE OR REPLACE VIEW view_hosts_ruled AS
            SELECT
            h.hid, — integer, pk
            h.htid, — integer, ref
            ht.htname, — varchar
            h.name, — varchar
            h.address, — inet
            h.hsyslocation, — varchar
            h.alias, — varchar
            h.register, — boolean
            h.firmware_version — varchar
            FROM hosts h
            LEFT JOIN hosttypes ht ON h.htid=ht.htid;

            CREATE OR REPLACE RULE view_hosts_ruled_up AS
            ON UPDATE TO view_hosts_ruled DO INSTEAD
            UPDATE hosts SET
            name = new.name,
            address = new.address,
            hsyslocation = new.hsyslocation,
            firmware_version = new.firmware_version
            WHERE hosts.hid=old.hid;

            — Test:
            update view_hosts_ruled set hsyslocation=hsyslocation where hid=1;

          • Andrew says:

            OK, that one is trivial as per the definition in the post: one unconditional INSTEAD action. Additionally, I guess the join against hosttypes is unique, so there’s no question of duplicate rows.

  8. chAlx says:

    That’s what I meant: trivial rules are mostly correct :)

    Well, sorry for offtop; the post was not about them. But many treat it as ‘All rules are bad, devs said’. People prefer to believe the shepherd than to think for themselves…

    • Andrew says:

      That it’s possible to write trivial correct rules doesn’t mean it’s a good idea to do so; I do indeed maintain that all* rules are bad and I believe that most PG devs agree.

      The biggest problem is that once you start using them, it’s not obvious where the landmines are. Also, once you cross the “trivial” threshold, rules will often appear to work, but fail on specific types of queries that you didn’t anticipate (and might not currently be using), leading to hard-to-diagnose failures in the future.

      *- “all rules” here excludes views themselves

Comments are closed.