David Heinemeier Hansson is a smart and charismatic guy, but that doesn't mean he's always right. I had always thought, apparently through ignorance, that the lack of support of foreign keys and composite keys in Ruby on Rails was due to its immaturity and was something that was on the horizon. But David set the record straight in his keynote at RailsConf -- he thinks they're old-school, unnecessary, and has no intention of supporting them. I believe his words were "Why should we bend to work with them? Let them bend to work with us!" That got me thinking.
I'm not one to make rash decisions. I'm always trying to look at an issue from all conceivable perspectives before weighing in on an opinion. So when I heard mister Heinemeier Hansson's shocking declaration, I had to stop and reconsider my enterprisey religion. I've been weaned on foreign keys and composite keys. It's been beaten into my brain that they are The Right Way (tm) to accomplish a Good Thing (tm). How could these rock-solid foundations of enlightened database design be so nonchalantly tossed aside? I was breaking out into a cold sweat.
Let's agree on an example schema before I get into the nitty gritty. I'm going to stick with the same objects and relationships I've used in my last dozen rants. We've got Collectors. Collectors possess Collectibles. Possessions store this relationship. The tables look like this (generated by a migration script):
Collectors
--------
ID
Name
Collectibles
--------
ID
Name
Possessions
--------
ID
Collector_ID
Collectible_ID
Quantity
Now my old-school enterprisey noggin immediately notices that:
1. Possessions.Collector_ID should be a foreign key to Collectors.ID
2. Possessions.Collectible_ID should be a foreign key to Collectibles.ID
3. Collector_ID and Collectible_ID in Possessions should be a composite key.
4. WTF is that ID column doing in Possessions!?
Let's tackle 1 and 2 first. Why should they be foreign keys? For data integrity! If the database doesn't enforce the reference between the two tables, naughty code might create records in the Possessions table that don't actually refer to existing Collectors or Collectibles. That would be bad.
What happens if you do in fact make them foreign keys
[1]? Well, the most annoying thing is that your tests stop running because fixtures aren't smart enough to load in the proper order
[2]. It tries to load dependent records before their dependencies. Not fun.
So for the sake of argument let's consider dropping the foreign keys. Wow, I just got a cold chill down my spine. I feel dirty just for having typed that sentence. But I'll press on...
As I've already mentioned, without the referential integrity we can get bogus records in this table. That irks an obsessive compulsive anal-retentive person like myself, but I'll concede that it's not the end of the world. The system can be coded to continue to operate with bad data. Shudder.
But there's another can of worms still to open, point number 3. If the Collector and Collectible IDs in Possessions are not a composite key, we can have multiple records representing the same relationship! That's very bad. But again, I'm going to try to be open-minded here. We can, at the very least, prevent that by making the two columns a "unique" constraint. That's still allowed within the Rails realm, isn't it? Well, not really, which brings us to point 4.
Rails, and DHH, believe that the one-primary-key-per-table world is a safe and workable place to be. This is why the Possessions table has a unique ID column. OK, I can live with that, I think. But here's where things start to get sticky (or stinky?) and I need to add some code here to demonstrate the problem. Here's the models:
class Collector < ActiveRecord::Base
has_many :possessions
has_many :collectibles, :through => :possessions
end
class Collectible < ActiveRecord::Base
end
class Possession < ActiveRecord::Base
belongs_to :collector
belongs_to :collectible
end
And here's a little test to create a Possession:
collector = Collector.find(19)
collectible = Collectible.find(72)
possession = Possession.new
possession.collector_id = collector.id
possession.collectible_id = collectible.id
possession.save!
And here's what we have in the database:
mysql> select * from possessions;
+----+--------------+----------------+----------+
| id | collector_id | collectible_id | quantity |
+----+--------------+----------------+----------+
| 19 | 19 | 72 | NULL |
+----+--------------+----------------+----------+
1 row in set (0.00 sec)
So far so good. But what happens when we need to delete a Possession? Here's the code:
possession = collector.possessions.find_by_collectible_id(72)
collector.possessions.delete( possession )
And here's what we're left with in the database:
mysql> select * from possessions;
+----+--------------+----------------+----------+
| id | collector_id | collectible_id | quantity |
+----+--------------+----------------+----------+
| 19 | NULL | 72 | NULL |
+----+--------------+----------------+----------+
1 row in set (0.00 sec)
Gah! That's one of them orphaned records I warned you about. What if we run the whole test again? Will it reclaim that record?
mysql> select * from possessions;
+----+--------------+----------------+----------+
| id | collector_id | collectible_id | quantity |
+----+--------------+----------------+----------+
| 19 | NULL | 72 | NULL |
| 20 | NULL | 72 | NULL |
+----+--------------+----------------+----------+
2 rows in set (0.01 sec)
Nope! Now we've got two lost souls. If we'd applied that unique constraint I'd mentioned earlier to the un-foreign key columns, the database would have thrown a fit and Rails would have choked.
So what's going on here? Well, from my point of view, it seems that Rails not only dissuades me from designing my database with proper referential integrity but it also litters my tables with orphaned records. I hope that I'm wrong here and some noble Rails advocate will set me straight, but even if there is some way of correcting this behavior it's clearly not the
default behavior and that goes against Rails' mantra of convention over configuration -- I shouldn't have to
configure Rails to
not break referential integrity and
not clutter my tables with bogus data.
Josh Susser and I have been discussing this issue in our blogs and via e-mail -- and we should have discussed it at RailsConf but I was in burn-out mode and didn't have the cognitive energy -- but we're both attempting to build plug-ins to make better sense of this issue. Rather than collaborating on a single solution, we're tackling it from different angles, but one thing our solutions seems we have in common is that we both agree those orphaned records need to be deleted. In reading a copy of Josh's code (and I hope he doesn't mind me revealing this) I found the amusing comment:
# delete ??? - would need to call destroy on join model instance.
And that's exactly what I did in my plug-in:
[snip]
# the default way that rails handles this is by
# "setting their foreign keys to NULL"
# this would leave orphaned records in the database,
# and I'm not cool with that
# AND since my migrations set the foreign keys to "not null" (duh!)
# this breaks hard-core
# so what I should do here is look up the unique record ID for the relationship
# (aside: why the hell do they have unique IDs rather than compound keys!?)
# and delete the record outright
@reflection.klass.delete( wrapper.id )
# TODO: I need to remove the wrapped relationship from the superclass
# collection as well or it will be in an invalid state. As a quick hack I might
# be able to simply force a reload of it from the database.
[/snip]
At the moment I'm torn on the issue. Do I want to "bend" to the Rails way and just live with the lack of referential integrity and bogus data? Or do I want to try to make Rails bend to work The Right Way(tm)? Obviously I'm trying the latter, and so are other prominent Railists like Josh, which to me is a good sign that this is an important issue. Hell, even Dave Thomas complained about it in his keynote at RailsConf. Thankfully I'm not yet using Rails for any commercial work so it's all academic mental self-pleasure for now. But if I were put into a position where I had to make the call, it would be an incredibly grueling decision, and it shouldn't have to be.
Footnotes:
[1] You can add foreign keys manually in the migration script via raw SQL or use Simon Harris' plug-in. Note that his plug-in is clever enough to not apply the foreign keys to the testing database so your fixtures don't barf.
[2] There are hacks you can use to control the order of fixture loading if you really want/need it.