Over the past few months, I’ve been incrementally modernizing the public-facing portion of this ColdFusion blog. And, all in all, it’s been going quite smoothly! But, when I started to work on modernizing the administrative portion of the blog, things got a little hairy. Not only is the administrative code even older and jankier than the public code, it requires CRUD (Create, Read, Update, Delete) operations on “join tables” in the MySQL database. And, up until now, I haven’t had to think about modeling these join tables in a data access layer (DAL).
My current database modeling contains two “join tables”. For the moment, let’s consider these “join tables” to be tables that do nothing more than relate entities in a many-to-many relationship. My two join tables are:
blog_entry_blog_entry_jn– this allows two blog entries to be related to each other. In my UI (user interface), I list out the related entries at the bottom of a blog detail page.
blog_entry_tag_jn– this allows me to categorize a blog entry as belonging to an arbitrary set of tags (ex, “ColdFusion”, “SQL”, “Work”). In my UI, I list out the tags associated with each blog post. And, I also allow the user to look up all the blog entries under a given tag.
As I started to flesh-out an official data access layer in my administrative module, I thought that starting with “tags” would be a “quick win” since it’s so simple. But, I had forgotten about this
blog_entry_tag_jn table. And suddenly, I was mentally blocked. I wasn’t sure if the “tag” DAL should be responsible for it? Or, if the “blog entry” DAL should be responsible for it?
At work, I brought this dilemma up in the Architecture Office Hours meeting (my favorite meeting of the week), and Alex Treyger – one of our Principal Engineers – pushed back against the very notion of a “join table”. He argued (in as much as I understood) that such an idea was nothing more than an implementation detail of the data persistence technology.
At the time, I didn’t entirely followed what he was saying. But, with some time to reflect on it, what I think what he meant was that in the application code there is no “join table” – there’s just an architectural relationship between entities. And, that it’s merely incidental that, in a relational database system, I happen to have chosen to serialize that concept in a separate table. And that, if I was using something other than MySQL, it’s quite possible that I wouldn’t even need a separate table to model the relationship (I think he brought up Object Oriented Databases as one such example).
As I was thinking about my database structure, and thinking about what Alex said, it reminded me of something I touched on a few years ago in a thought experiment wherein I split a wide a data table up into two tables for high-writes and high-reads. In that post, I said:
Another general pattern that I am seeing here is the elevation of “domain concepts”. What I mean by that is the promotion of an “entity property” into a full-on “entity”. In this case, the “User” property
lastRequestAtis being promoted to the domain-level concept of “Presence”.
The more complex my applications become, the more I find myself doing – or wishing I had done – this (elevating entity properties). Creating a domain-level concept makes it easier to think about how data can be consumed and evolved within an application.
The saddest part about this is that I actually had a very similar thought 14 years ago when I wrote about SQL “Join Tables” vs. SQL “Entity Tables”. And, in the comments to that post, Mark Bugno pushed back in way that feels reminiscent of what Alex was saying:
I disagree with you on this (and agree with your initial train of thought). There is no fundamental difference between a relationship table and entity table, in much the same way there is no difference between a circle and an ellipse – one is simply a specific instance of the other. In the geometric example, a circle is an ellipse where the foci just happen to be collocated; in the database example, a relationship table is an entity table that just happens to have no additional columns.
Every relationship (be they one-to-one, one-to-many or many-to-many) can be garnished with additional details if one so chooses. In the case of one-to-one and one-to-many relationships, we might create a table between the existing two and add the details there. The only difference with a many-to-many is that the table already exists.
Alex and Mark aren’t saying the exact same thing. But, I think both of them are alluding to the idea that a “join table” isn’t a “natural concept” – that it is merely a construct that I am using because of how I’ve organized the data in my head.
And, going back to my earlier thoughts on the “elevation of domain concept”, when I split the wide table up into two narrower tables, it wasn’t so much a technical shift as it was a mental shift: changing the way that I was thinking about the data that I was persisting to the database. The change in perspective precipitated the change in the schema.
In an attempt to synthesize all of these ideas into a spicy take, I’m now thinking that a “join table” is nothing more than a “entity” that is lacking a proper name. Naming stuff is hard (Martin Fowler) – we all know this. And, I think this is why “join tables” happen: because I couldn’t conjure a name that clearly “meant that thing”.
So, what if instead of
blog_entry_tag_jn, I had a “Tagging” entity. And, what if instead of
blog_entry_blog_entry_jn, I had a “Suggested Reading” entity. These names aren’t great; but, they are – at least – an attempt to articulate the “thing” that the join tables were trying to represent.
Is it too much to conclude here that a “join table” is actually just a “code smell”? I’m gonna wear that hat for a while and see how it fits.