Monday, 25 January 2010

Thomas Kuhn, Paradigms, NoSQL and the RDBMS

It’s been a long time since I last blogged, and this may be the last time I blog here as I feel the title no longer adequately reflects my interests but…

As I prepare my talk, on NoSQL generally and CouchDB in particular, for #DDD8 the applicability of Kuhn’s concept of the paradigm to the current NoSQL/RDBMS situation becomes more and more interesting for me.

I’ll start by, in brief, stating what is at least in some quarters considered self-evident truth:

  • The RDBMS has been the norm for storing data from applications for a long time;
  • The RDBMS tech’s (Oracle in particular, but the others to differing degrees as well) has become too complex and has suffered from feature-bloat;
  • The problems which the RDBMS can’t solve (at the least elegantly) has grown in number and importance;
  • Technology (in particular hardware) has changed dramatically;
  • The problems which we want an RDBMS to solve have changed dramatically also.

Ok that said, let’s briefly look at what Kuhn says in his seminal book ‘The Structure of Scientific Revolutions’ (my copy is in my parents attic so I’m going here from memory & old uni essays on my HDD).

Science doesn’t build one discovery upon the next in a neat fashion. An initial paradigm arises in response to particular problems which it is seen to be helpful in solving.

“Paradigms gain their status because they are more successful than their competitors in solving a few problems that the group of practitioners have come to recognise as acute. To be more successful is not, however, to be either completely successful with a single problem or notably successful with any large number.” (Kuhn 1962:23)

Once this has happened it achieves a level of dominance (hegemony perhaps) over thinking within that community and further problems are defined in its terms. In order to accommodate the solving of new, different, problems the theory grows, becoming increasingly unwieldy. For some problems however embellishing the model to resolve them is not possible. Over time such anomalies grow in number and in importance. Eventually ‘rebels’ propose counter-theories. Kuhn describes this period as one of crisis and of revolution. An alternate candidate to the existing paradigm will only supplant the current paradigm if it allows for a better accommodation of the previously accrued anomalies and also provides a new impetus for scientific study.

So just to make the parallels (relevance) between Kuhn’s theory of paradigms and the situation with storing data that I’m suggesting clear, I’ll go over my initial bullet points in the light of what I’ve just said about Kuhn:

  • The RDBMS model (and in particular the expression of this model in tech’s like Oracle & SQL Server) has had a dominant paradigmatic (hegemonic) status for a long time now;
  • In order to answer new problems the RDBMS tech (and to an extent, though I know less of them, the models that support this tech) has become bloated as it is embellished (has features added);
  • A growing number of problems which the RDBMS cannot solve (at least in ways that are considered reasonable) have become increasingly important and visible;
  • Technology changing is a bit like the laws of the universe changing for a physicist. No wonder we’re churning paradigms in IT. The move from aristotelian to newtonian physics was in large part due to the problems that were being posed and the inability of the former paradigm to answer them effectively. Well that’s happened for us, but also the universe changed. Processor speed, size (and cost) of available memory, … All these things have changed dramatically and have a massive bearing on the utility of any theory.

I think that we are in a period of crisis when it comes to how we store data. I don’t mean this in a bad way. Periods of crisis are incredibly creative periods where new models are considered and experimented with. I don’t think that we will necessarily see the emergence of a single replacement paradigm either. Through the recognition that different problems require different solutions I think that we may see a number of ‘new’ paradigms emerge which don’t so much compete as compliment each other (network-oriented solutions like Neo4J and document databases like CouchDB for example). Nor am I suggesting that the RDBMS model will go away. I do though think that the big vendor bloat that has occurred, with all the associated costs which this brings, may mean that lighter-weight RDBMS implementations emerge. That said, they may just wither and die as problems like the object-relational impedance mismatch are unlikely to disappear soon I suspect, and ORMs just add another layer of bloat to the RDBMS. In fact ORMs remind me of the ‘doing the wrong thing righter just makes you wronger’ quote from Dr. Russell Ackoff that I love.

Next up (but probably not here), ’agile/lean as liberal modes of governance – the applicability of Foucault to project management in IT’, or, ‘changing the enterprise using wars of position – a guide to Gramsci for agents of change’, depending on what I feel like.

Kuhn, Thomas S. (1962) The Structure of Scientific Revolutions. Chicago: University of Chicago Press