Square Data and Round Holes

My first programming job was done using Report Generator Language (RPG) on the IBM System 36. The hardware was green screen, the tape decks reel-to-reel, and the printers large and noisy. The language itself was very data-centric with each program declaring formatted Input or Output data structures that were read or written to. Each structure mapped to a file, a screen buffer, or a printer spool. In spite of all this we did get the job done, although our biggest problem was the business changing requirements on us that necessitated altering the data structures. Because they were burned into the program's source specification, changing a file required altering every program that used it and a conversion job to update the live user data to use the new format. The inertia of this task made it important to do thorough up-front analysis to get the data relationships and attributes as correct as the available knowledge allowed.

Preemptive flexibility sometimes included soft-coding all of the data structures to be called anonymous names such as "user field 1" or "user field 2." Separate definition files for each application mapped which fields were used in which context and the intention was that when the business required some new data attribute to be added, an unused extra structure was simply activated by hacking around with the definition files to make the new attribute available on screens and reports.

The thing that grabbed me about object-oriented programming when I was first introduced to it (through the Smalltalk language and then Java) was that all of this would be fixed. The program was no longer concerned with its data; instead this was all encapsulated inside objects that provided a behavioral API, making the system more malleable and extensible. Inheritance and polymorphism and other facets are nice features of the language, but data encapsulation was the key thing that sold me.

The first couple of systems I worked on were business apps that had to deal with back-end relational corporate data, and the problem that arose is the well-trodden one of how to persist objects in a relational database. Objects have things like inheritance, many-to-many relationships, many ended links with no back pointer, untyped data structures (in the case of Smalltalk), and other facets that just don't fit into a row/column fashion. Initially as I wrestled with this impedance mismatch by writing or using fancy frameworks, I always believed that this was a temporary point-in-time exercise required because of the existence of legacy relational databases; OO databases were just around the corner so their arrival would cure all our maladies.

Frederick Brooks has a chapter "No Silver Bullet" in his superb treatise The Mythical Man Month, and unfortunately my belief in OO databases fell naively foul of his prediction. I had the good fortune to work with a very powerful OO database while programming for a bank. It did effectively persist the objects, however, it failed to meet the business' needs precisely because it was structured around objects and not rows and columns. Users needed fast and varied access to their data, and just about every existing application from spreadsheets to off-the-shelf GUI builder tools was on their desktops itching to access the corporate data. These all required the data to be relational, and although an ODBC-to-OO bridge existed, apart from being some kind of intellectual nasty bolt on, it meant that the user's thinking of the data was by definition a relational one. After a while some converts began suggesting it was pointless to be doing OO at all, and one strong argument came from the fact that the data itself was inherently row/column based because that was how it was received by the system and its users. Input came from external data feeds (where the data was structured) or from manual input where tables and lists were rows of data, and fields of data populated the columns.

Object-oriented databases do work and are widely used in apps that don't need to publish their data as a corporate database (such as embedded devices); however, for the corporate world it perhaps looks as though OO databases haven't achieved the critical mass required to become widely accepted. On one major database vendor's Web site, the listing of their product portfolio promoted their relational mapping software in preference to their (very good) OO database as the initial deployment configuration for J2EE.

One way in which object-oriented data stores might enjoy a renaissance is with XML. By its nature XML is structured around a tree of nodes that can repeat and contain further nodes, data elements are optional, and, although XML enjoys being used as a readable structured message format, it's also used as a way of representing and persisting data. XML doesn't store itself in rows and columns easily; however, if it's persisted in raw text for queries a search engine needs to be able to peek at its contents. This is essentially what Web search engines do - initially they just queried into HTML (which is no more than XML marked up specifically for browser syntax), although now they recognize specific content formats (.doc, .pdf) and promise to even embrace the desktop's contents itself as their data source (http://news.com.com/Google+to+unveil+desktop+search/2100-1024_3-5408765.html?tag=nefd.lede).

If the future of search engines is to query data irrespective of source, and the flexible and user friendly nature of searches exceeds anything that SQL could do for a nontechnical corporate user, is it possible that object-oriented databases will be reborn with the required search engine interfaces? Or is the problem simply that data sticks where it lands, and most companies are loathe to physically move data from its initial resting place lest the downtime and potential errors create more problems than are solved?

Joe Winchester, Editor-in-Chief of Java Developer's Journal, was formerly JDJ's longtime Desktop Technologies Editor and is a software developer working on development tools for IBM in Hursley, UK.

