Tuesday, June 27

Should we drink DHH's Kool-Aid?

David Heinemeier Hansson is a smart and charismatic guy, but that doesn't mean he's always right. I had always thought, apparently through ignorance, that the lack of support of foreign keys and composite keys in Ruby on Rails was due to its immaturity and was something that was on the horizon. But David set the record straight in his keynote at RailsConf -- he thinks they're old-school, unnecessary, and has no intention of supporting them. I believe his words were "Why should we bend to work with them? Let them bend to work with us!" That got me thinking.

I'm not one to make rash decisions. I'm always trying to look at an issue from all conceivable perspectives before weighing in on an opinion. So when I heard mister Heinemeier Hansson's shocking declaration, I had to stop and reconsider my enterprisey religion. I've been weaned on foreign keys and composite keys. It's been beaten into my brain that they are The Right Way (tm) to accomplish a Good Thing (tm). How could these rock-solid foundations of enlightened database design be so nonchalantly tossed aside? I was breaking out into a cold sweat.

Let's agree on an example schema before I get into the nitty gritty. I'm going to stick with the same objects and relationships I've used in my last dozen rants. We've got Collectors. Collectors possess Collectibles. Possessions store this relationship. The tables look like this (generated by a migration script):



Now my old-school enterprisey noggin immediately notices that:

1. Possessions.Collector_ID should be a foreign key to Collectors.ID

2. Possessions.Collectible_ID should be a foreign key to Collectibles.ID

3. Collector_ID and Collectible_ID in Possessions should be a composite key.

4. WTF is that ID column doing in Possessions!?

Let's tackle 1 and 2 first. Why should they be foreign keys? For data integrity! If the database doesn't enforce the reference between the two tables, naughty code might create records in the Possessions table that don't actually refer to existing Collectors or Collectibles. That would be bad.

What happens if you do in fact make them foreign keys[1]? Well, the most annoying thing is that your tests stop running because fixtures aren't smart enough to load in the proper order[2]. It tries to load dependent records before their dependencies. Not fun.

So for the sake of argument let's consider dropping the foreign keys. Wow, I just got a cold chill down my spine. I feel dirty just for having typed that sentence. But I'll press on...

As I've already mentioned, without the referential integrity we can get bogus records in this table. That irks an obsessive compulsive anal-retentive person like myself, but I'll concede that it's not the end of the world. The system can be coded to continue to operate with bad data. Shudder.

But there's another can of worms still to open, point number 3. If the Collector and Collectible IDs in Possessions are not a composite key, we can have multiple records representing the same relationship! That's very bad. But again, I'm going to try to be open-minded here. We can, at the very least, prevent that by making the two columns a "unique" constraint. That's still allowed within the Rails realm, isn't it? Well, not really, which brings us to point 4.

Rails, and DHH, believe that the one-primary-key-per-table world is a safe and workable place to be. This is why the Possessions table has a unique ID column. OK, I can live with that, I think. But here's where things start to get sticky (or stinky?) and I need to add some code here to demonstrate the problem. Here's the models:
class Collector < ActiveRecord::Base
has_many :possessions
has_many :collectibles, :through => :possessions

class Collectible < ActiveRecord::Base

class Possession < ActiveRecord::Base
belongs_to :collector
belongs_to :collectible

And here's a little test to create a Possession:
collector = Collector.find(19) 
collectible = Collectible.find(72)

possession = Possession.new
possession.collector_id = collector.id
possession.collectible_id = collectible.id

And here's what we have in the database:
mysql> select * from possessions;
| id | collector_id | collectible_id | quantity |
| 19 | 19 | 72 | NULL |
1 row in set (0.00 sec)

So far so good. But what happens when we need to delete a Possession? Here's the code:
possession = collector.possessions.find_by_collectible_id(72)
collector.possessions.delete( possession )

And here's what we're left with in the database:
mysql> select * from possessions;
| id | collector_id | collectible_id | quantity |
| 19 | NULL | 72 | NULL |
1 row in set (0.00 sec)

Gah! That's one of them orphaned records I warned you about. What if we run the whole test again? Will it reclaim that record?
mysql> select * from possessions;
| id | collector_id | collectible_id | quantity |
| 19 | NULL | 72 | NULL |
| 20 | NULL | 72 | NULL |
2 rows in set (0.01 sec)

Nope! Now we've got two lost souls. If we'd applied that unique constraint I'd mentioned earlier to the un-foreign key columns, the database would have thrown a fit and Rails would have choked.

So what's going on here? Well, from my point of view, it seems that Rails not only dissuades me from designing my database with proper referential integrity but it also litters my tables with orphaned records. I hope that I'm wrong here and some noble Rails advocate will set me straight, but even if there is some way of correcting this behavior it's clearly not the default behavior and that goes against Rails' mantra of convention over configuration -- I shouldn't have to configure Rails to not break referential integrity and not clutter my tables with bogus data.

Josh Susser and I have been discussing this issue in our blogs and via e-mail -- and we should have discussed it at RailsConf but I was in burn-out mode and didn't have the cognitive energy -- but we're both attempting to build plug-ins to make better sense of this issue. Rather than collaborating on a single solution, we're tackling it from different angles, but one thing our solutions seems we have in common is that we both agree those orphaned records need to be deleted. In reading a copy of Josh's code (and I hope he doesn't mind me revealing this) I found the amusing comment:
# delete ??? - would need to call destroy on join model instance.

And that's exactly what I did in my plug-in:
# the default way that rails handles this is by
# "setting their foreign keys to NULL"
# this would leave orphaned records in the database,
# and I'm not cool with that
# AND since my migrations set the foreign keys to "not null" (duh!)
# this breaks hard-core
# so what I should do here is look up the unique record ID for the relationship
# (aside: why the hell do they have unique IDs rather than compound keys!?)
# and delete the record outright
@reflection.klass.delete( wrapper.id )
# TODO: I need to remove the wrapped relationship from the superclass
# collection as well or it will be in an invalid state. As a quick hack I might
# be able to simply force a reload of it from the database.

At the moment I'm torn on the issue. Do I want to "bend" to the Rails way and just live with the lack of referential integrity and bogus data? Or do I want to try to make Rails bend to work The Right Way(tm)? Obviously I'm trying the latter, and so are other prominent Railists like Josh, which to me is a good sign that this is an important issue. Hell, even Dave Thomas complained about it in his keynote at RailsConf. Thankfully I'm not yet using Rails for any commercial work so it's all academic mental self-pleasure for now. But if I were put into a position where I had to make the call, it would be an incredibly grueling decision, and it shouldn't have to be.


[1] You can add foreign keys manually in the migration script via raw SQL or use Simon Harris' plug-in. Note that his plug-in is clever enough to not apply the foreign keys to the testing database so your fixtures don't barf.

[2] There are hacks you can use to control the order of fixture loading if you really want/need it.


Luke Francl said...

Thanks for this post.

I'm also very uncomfortable not using foreign keys to ensure data integrity.

Another advantage of using foreign keys is that most databases are optimized for joins using them. By not using foreign keys, you're losing that, and need to apply indexes manually.

Jason Carreira said...

DHH's position is indefensible. Ever tried looking at a database schema for an app that you didn't write (and remember that apps spend a lot more time in maintenance than in active development) when there's no RI? It's nigh-on impossible to figure out what the hell is related to what. Foreign keys aren't just about keeping the data clean, they're also self-documenting structural elements of the database that let you look at the pretty diagrams that tell you what's going on.

Oh, and if you think J2EE is "enterprisey", DHH is really harkoning back to the "good old days" of COBOL ;-)

Michael Kovacs said...

Hey Ted,

I like your posts on rails and have had some of these same frustrations, namely the fixture loading problem and relations. The example you show here is a bug with has_many :through, as I use every other relation type and don't have orphaned records as a result of deletion.

I have one production app out there now and another one that I'm building right now. I too have issues with AR and features that are missing (e.g. inconsistent cascading operation semantics across relation types).

I've discussed this and others with Josh as the last Ruby meetup here in SF and it's good to see that there are more folks that think AR needs some more love to become as powerful, or at least approaching the power of other more mature O/R frameworks out there.

My blog is here. I post about gotchas with rails and other things I think are broken.

Damon Clinkscales said...

This is how i specify a join table migration. Note, this example is from my RailsConf talk:

class AddProgramRules < ActiveRecord::Migration

def self.up
create_table :programs_rules, :id => false do |t|
t.column :program_id, :integer
t.column :rule_id, :integer
add_index :programs_rules, [:program_id, :rule_id], :index_type => :unique, :name = 'program_rule_index'


def self.down
drop_table :programs_rules

So, no id column and a unique index on the composite of the two key fields. I think this achieves the goal with the join table and uses existing migration syntax without dropping to SQL execute(). This was one of your options you mentioned, so I assume it would work for you with respect to join tables.

My full presentation is available at my web site if you are interested.

With respect to the Kool-Aid, there are many examples where features have been added that enable you to work with AR in different ways...set_inheritance_column and set_table_name come to mind. Foreign keys and composite primary keys do appear to be sticking points with Rails core. However, there's no reason to not push forward with plugins for the time being. There are currently 2 plugins to work with foreign keys, one of which was mentioned above.

DHH said...

Thanks for the praise. But allow me to correct a few misconceptions. It's only natural for those to develop when you feel that techniques that have "...been beaten into my brain that they are The Right Way (tm) to accomplish a Good Thing" are "under attack". So I forgive you already :).

First, let's seperate the issues. In my mind, there's a world of difference between foreign key support and composite primary key support. The former is actually pretty well "supported" in Rails and quite a few of the core developers use them. Migrations doesn't yet provide a way to declare them, but we're willing to accept a patch to have that rectified. Fixtures support them too if you name them in the proper order. Sure, it's a little more painful than not caring about the order, but if you find great value in them, it shouldn't be much of an issue.

I personally prefer to specify that kind of constraints in the object model using the :dependent option in associations, like has_many :possessions, :dependent => :destroy. But I won't get in your way, if you really want it happening in the database.

I do object to the drama over "multiple records representing the same relationship" being the end of the world, though. First of all, Possessions should be a join model. So it's already unique by having its own id. And second, it's quite reasonable to model quantity as multiple possessions against the same two records. The possessions table may well have additional attributes governing that link, say, when the possession was made (I made one possesion on the 3rd of May, another on the 25th).

The "problem" with the code you show seems to be a misunderstanding of how associations in Rails work. A call like collector.possessions.delete(possession) is not MEANT to destroy the possession. You do that by calling either possession.destroy or collector.possessions[1].destroy. The call you're making reads like this "delete the possession from the collector's list of possessions (thus making you free to give the possession to someone else)". Considering that the method is named delete, though, I'll treat it as an honest mistake and invite you to add further documentation if you found it lacking.

Again, you could have solved that problem with a constraint in the database like "not null". Then you would get a barf. You don't need foreign keys for that. And "not null" is even supported by migrations.

Also, designs differ. In many setups, it's entirely reasonable for a foreign key to be null. That could mean that no collector has yet been assigned and you're waiting on that to happen. Whether that's a good idea or not depends on how you're modeling your domain. No right or wrong answer there.

Composite primary keys are different, though. Those fuckers are evil :). They're either the result of a join table (in which case Rails actually supports them) and I no longer consider join tables a good idea (you should reveal the domain model that your join table is hiding). Or they're the result of using natural keys. Natural keys are horrible things for a plentitude of reasons. So-called unchangeable foreign keys have a tendency to change anyway. And they're a royal pain in the ass to work with on in both the object model and in associations. In other words, they're so not worth it. All their supposed benefits can be gained by other database techniques or by implementation in the object model.

Using composite primary keys in non-join tables shows an immense amount of disrespect for the whole technology stack. They're so inconvenient to deal with in both objects, controller, and view that you're simply just being an ass if you insist on them.

tick said...

If I'm totally wrong here, please excuse it because I'm only a meta-Rails follower, I've not actually used it.

While I can kind of understand some of the reasoning for lack of/poor support for things like foreign keys and composite primary keys, it seems like a bad thing to enforce these design beliefs on Rails applications. As others have said, foreign keys do have a lot of benefits, referential integrity, embedded documentation being good examples. They are pretty standard and beneficial.

Composite keys can be argued till the cows come home, but their merits really don't matter. The fact is it's a dirty world out there. Which brings me to my primary complaint.

There have been a number of discussions of Rails in the enterprise environment. It seems to me as though Rails would be absolutely grand to build a web site from the ground up with. But a web site is simply a UI, perhaps on top of something that is more than that. Should use of Rails on existing databases/systems be thrown out the door? It's one thing if you have a web site built with some technology (Java, PHP, ASP.Net, whatever) and you want to migrate to Ruby. Fine, migrate your database. But in the enterprise world, for better or for worse, databases are often shared between applications. Sure you can abstract this, but in the real world that isn't always an immediate option.

So is the intention to discourage this kind of application of Rails? When you do have shared databases, referential integrity becomes critical. I've seen numerous shared databases that lacked RI, and frankly it was a nightmare. Both from a data perspective as well as a figuring out what the hell is going on perspective.

If Rails is trying to dodge the enterprise world, then forget everything I've said. If not, then it sure seems as though your target audience is relatively tiny based on these restrictions.

DHH said...

I definitely have sympathy for people stuck in a bad situation that they didn't design themselves in the first place. I consider being forced to work on an application that uses both composite primary keys AND operate as an integration database. Ay caramba.

For these people, I would like to see a plugin that made their life simpler. But its a fine line. If Rails just supported composite primary keys out of the box with no pain, the DBA stronghold could well argue that the NEW schema for next years application should use composite primary keys again. That would be a travesty.

So my ideal is that it should be possible, but hurt a little. Enough to imprint developers with a notion that "next time, I'll make bloody sure this doesn't happen again".

And again, please don't conflate the issues. Foreign keys are already supported quite well and many people in the Rails community are using them. If they cook your noodle, you're free to use them. Composite keys for non-join tables are a completely separate matter.

Anonymous said...
This comment has been removed by a blog administrator.
repentso said...

DHH: this is just crazy talk. When your paycheck is tied to large and complex datasets, I'm sure you'll waste no time finding religion.

Until then, you can pretend E.F.Codd and a generation of Database theorists were morons, and that whole silly Relational Calculus thing is just a bunch of stupid ideas.

And Your credentials: You wrote a PHP web framework in Ruby.

Chris said...

No one has brought up the issue of composite keys and their use in data locality (clustering).

As far as I know, mysql does not support clustered indices and instead uses the primary keys to accomplish this.

I have worked on some very large projects/frameworks for very big companies and while I am no expert I know that clustered indices were a big performance winner.