3 Rules for Data-driven Architecture
Updated · Sep 03, 2013
At a recent conference, Dave Turek of IBM asserted that the future computing architecture would have data rather than computation “at the center.” What he is saying, I believe, is that enterprise IT-related differentiation and competitive advantage now derives from deeper dives into Big Data to determine better ways to bind one's customers through proprietary knowledge of them, rather than from generic business apps that competitors can easily match.
So how should IT and vendors together design such a “data-driven” enterprise architecture? Let me suggest three rules that past experience has shown are critical to full, long-term success.
Rule 1: Virtualize the Data
To say that data is conceptually at the “center” of an architecture is not to say that it makes sense to move all data into a central physical repository. On the contrary, data virtualization and cloud experience has shown that it makes sense to leave large data repositories where they are and “bring the computation to the data” in order to maximize performance.
The enterprise data-driven architecture should view the data as one “virtual” database, preferably a Web service. (This goes better with administrative tools.)
The best method for doing this is with data virtualization tools such as those available from data virtualization vendors and (to a certain extent) master data management providers. These usually support a global data repository, “one interface to choke” for programmers and global administrative tools.
Rule 2: Maximize Information Quality
Information quality, as I define it here, is not the same as the common usage of the term “data quality” (although it includes it). Instead, it means delivering high-quality data to the correct recipients as rapidly as possible, so that they can use the resulting analyses for more effective decisions.
To do this, I suggest thinking of the enterprise's information processing as a series of “stages”:
- Input, where new data is made correct;
- Merger with existing data, where new data is made consistent with older data;
- Aggregation, where the new data is made visible across the enterprise;
- Delivery, where the correct targets are identified and the correct information is delivered to them for analysis.
Note that because of virtualization, all of the steps can be performed on data “where it lies.” The point here is that optimizing performance of bad data for the wrong people is no better than, and sometimes worse than, no data-driven optimization at all.
Rule 3: Let Data Drive the Analysis
This is a subtle point, but important to consider. In the past, analyses have generally proceeded by trying to make the data fit preconceived notions — hence, database administrators create metadata that optimize past patterns of analysis, hindering the ability to adapt the database to new needs. I found in my own experience that it was possible to create better programs by simply exposing the structure of the data, including changes in the types of information, to the user, and letting the user drive usage of the data.
Thus, the focus of the enterprise architecture should be on a more flexible form of metadata repository that can “load balance” between multiple, more specialized information sources. In my own experience, and that of data virtualization vendors, this will in the long run perform better and scale better.
Early Days of Data-driven Architecture
It is, of course, too early to say in which direction Turek's architecture with “data in the center” will go. However, it is not too early for IT to consider how to prevent lock-in to particular hardware or software vendors from allowing them to scale with the incredible demands for scalability of Big Data — and keep scaling into the indefinite future. Virtualization, information quality and a data-driven approach to analytics should be a big help in achieving that goal.
Wayne Kernochan is the president of Infostructure Associates, an affiliate of Valley View Ventures that aims to identify ways for businesses to leverage information for innovation and competitive advantage. Wayne has been an IT industry analyst for 22 years. During that time, he has focused on analytics, databases, development tools and middleware, and ways to measure their effectiveness, such as TCO, ROI, and agility measures. He has worked for respected firms such as Yankee Group, Aberdeen Group and Illuminata, and has helped craft marketing strategies based on competitive intelligence for vendors ranging from Progress Software to IBM.
Wayne Kernochan has been an IT industry analyst and auther for over 15 years. He has been focusing on the most important information-related technologies as well as ways to measure their effectiveness over that period. He also has extensive research on the SMB, Big Data, BI, databases, development tools and data virtualization solutions. Wayne is a regular speaker at webinars and is a writer for many publications.