| By Dean Allemang | Article Rating: |
|
| February 12, 2009 12:19 PM EST | Reads: |
1,465 |
All organizations, including multinational corporations and government agencies, face a common problem of enterprise data integration. Obviously, large-scale sources of the problem stem from mergers and acquisitions. When a large company is formed from other pieces, each brings with it its own data, in its own form.
But the problem isn't restricted to large conglomerates. All businesses have information trapped in a wide variety of forms, including e-mail, spreadsheets, Web pages and a variety of proprietary sources. More than ever, it's difficult for a business to know what it knows.
This isn't a new problem, and a variety of enterprise data integration solutions have been on the market for a number of years to provide solutions
to this problem. But the problem remains. Why? All too often, a new enterprise integration solution is designed for a particular use, and while it's successful in that context, it fails to be extensible to other uses. In extreme cases, the need for integrated information has evolved so that the new integrated system is obsolete before it even goes live. The integrated enterprise system simply becomes yet another information source, competing now with the originals. The problem has become bigger, not smaller.
The solution to enterprise data integration can't just be another data system added into the enterprise mix. It has to be a living, extensible, network of information in the enterprise. In short, it has to work like the Web.
The Semantic Web & Enterprise Data
The World Wide Web Consortium (W3C) has been working on the problem of extending the Web we know today into a Web of Data. This new generation of the Web - dubbed the Semantic Web by W3C and sometimes known as Web 3.0 - harnesses the power of the World Wide Web for managing data. Many of the features that are now familiar to us from the Web are directly relevant to the real problem of enterprise data integration:
- The Web is extensible, and not just by its designers
- Anyone can refer to any Web resource
- Information in any format is available
Agreement & the Semantic Web
A common misconception about the Semantic Web is that it is based on a universal agreement about the meaning of terms. Indeed, if we could get everyone in the world to agree on what the word customer means then information integration on a worldwide scale would be greatly simplified. But this is an unrealistic expectation. Different companies and even different workers in a single company, have legitimate and differing notions about what even such a basic word as customer means.
While it is unrealistic to ask the two branches to agree to use the word the same way, it's not unrealistic to discuss which use is more general, and how. But before we can even say, "The Kansas City office uses the word customer in a more specific way than the New York office does," we have to be able to refer to "The Kansas City office's use of the word customer" and "The New York City office's use of the word customer." The Semantic Web provides agreement just at this level - agree on how to refer to your terms, so that you can discuss how you agree and disagree on their meaning.
This kind of agreement is achieved by having a single global reference for everything. This may seem like an ambitious goal, but it's in fact the part of the Semantic Web that's borrowed lock, stock, and barrel from the current Web, where entities are identified and managed with identifiers called URIs (which are slight variants of the familiar URLs we use in Web browsers every day). The URI is the key to the extensibility of the World Wide Web we know today, and serves as the basis for the extensibility in the Semantic Web.
Representing Data on Semantic Web
Data representation in the Semantic Web is based on a standard called RDF, which breaks data representation down to its most basic part. In RDF, this is called a triple. A triple is a basic statement about a relationship. The three parts of the triple are called the subject, predicate, and object (borrowing notation from basic grammar), where the subject and the object are two entities that are related to one another, and the predicate specifies the relation. A triple holds the same informa-tion as a cell in a spreadsheet or a database; the row id, column id, and the cell contents making up the three parts of the triple (see Figure 1).
Using this simple model, information from any data source (spreadsheets, databases, XML documents, Web pages, RSS feeds, e-mail, ...) can be represented in a uniform way. Since all information is referenced via global URIs, any data source can refer to any other. This is how the Semantic Web achieves the same extensibility as the familiar Web.
Enterprise Data Integration - Before and After
There are a number of approaches to enterprise data integration today. While there are some key differences in these approaches, they have some things in common. In all these approaches, a model, corresponding loosely to a Master Data File in earlier technology, is built to reflect the requirements of the integrated data set. Existing data is then mapped to this model. The approaches differ in the expressiveness of this model and the details of the mapping (e.g., is data transformed and warehoused, or left in situ and proxied), but in all cases, the model itself is rigid and proprietary.
Semantic Data Integration differs in a number of ways. While it also relies on a model of the integrated data, the model itself is represented in RDF. This means that the model itself is extensible and flexible. If a Semantic Data Integration model is obsolete, it can be extended easily. And not just by its designer; as a Web model, it can be extended by anyone. Representing the model in RDF also means that it is backed by a standard; any RDF model can be loaded into a wide variety of vendor tools with no loss of information. Unlike previous proprietary approaches, the enterprise is not locked into a particular vendor's technology.
Barriers to Adoption
The Web sparked a revolution in how information is managed in the world-at-large. The unruly, almost chaotic way in which anyone can put up a Web page challenged our thinking about publishing, libraries, and information management on the whole. Semantic Data Integration represents a similar challenge for the enterprise; conventional wisdom has left control of corporate data in the hands of a small number of professional data managers who made sure that data did not get out of control. But the proliferation of extracurricular data in e-mail and spreadsheets attests to the fact that there is a need for individual workers to have a stronger hand in the management of their data. This tension isn't a result of a Semantic Data Integration approach; it's a real force in the enterprise. Semantic Data Integration is a reasoned approach to engaging with and managing that force for the benefit of the enterprise.
Current & Future State of Semantic Data Integration
The Semantic Web standards have been several years in the making, but are now proving themselves in real data integration situations. In the enterprise, adoption is understandably cautious, as it is with any new technology. But we're seeing successful deployments that exploit the extensibility and flexibility of semantic data integration to create applications that are resilient in the face of fast-changing data requirements.
Non-functional requirements like scalability, privacy, and security are always concerns for a data-intensive technology. While many open source RDF systems offer some assurances in these areas, database giant Oracle's entry into the field (with its reputation for non-functional support) that has done the most to calm any uneasiness along these lines.
More and more companies are feeling the pain in their daily business of disintegrated data. As other approaches continue to fail, it's becoming clear that while Semantic Data Integration may not be a silver bullet, it is a revolutionary capability; whoever is the first to master it will dominate their space. Successful adoption of Semantic Data Integration isn't without its problems, but more and more enterprises are turning to Semantic standards to address their enterprise information needs.
Published February 12, 2009 Reads 1,465
Copyright © 2009 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Dean Allemang
Dr. Dean Allemang specializes in innovative applications of knowledge technology. He was awarded his PhD in AI in 1990, worked at five different AI labs in Europe between 1990-1996, co-founded a company in the mid-90s that tried to invent the Semantic Web when the standards were just a gleam in the eye of a few W3C folks. He was winner of the Swiss Technology Prize twice, and has filed two patents on the application of graph matching algorithms to the problems of semantic information interchange. As an internationally recognized expert in the Semantic web, he participated in the review board for the Digital Enterprise Research Institute-the world's largest Semantic Web research institute. He leads TopQuadrant's successful TopMIND training series, from which he drew much of the inspiration for his recent book (co-authored with Prof. Jim Hendler), Semantic Web for the Working Ontologist.
- Cloud CEOs, CTOs & SVPs to Speak at 4th International Cloud Computing Expo
- Kindle 2 vs Nook
- Why IBM’s Server Chief Got Busted
- The Difference Between Web Hosting and Cloud Computing
- Cloud Computing Journal Opens "Readers' Choice Awards" Nominations
- Cloud Computing Expo: Exclusive Q&A with Yahoo! SVP Cloud Computing
- Industry Experts Discuss the State of Cloud Computing
- Ajax in RichFaces 3.3, JSF 2 and RichFaces 4
- It's the Java vs. C++ Shootout Revisited!
- The End of IT 1.0 As We Know It Has Begun
- An Introduction to Abbot
- Java Kicks Ruby on Rails in the Butt
- Interviewing Java Developers With Tears in My Eyes
- Cloud CEOs, CTOs & SVPs to Speak at 4th International Cloud Computing Expo
- 1st Annual Government IT Expo: Call for Papers Deadline July 15
- How to Diagnose Java Resource Starvation
- REA Is Where RIA Becomes the Norm
- Kindle 2 vs Nook
- Anatomy of a Java Finalizer
- Why IBM’s Server Chief Got Busted
- A Cup of AJAX? Nay, Just Regular Java Please
- Java Developer's Journal Exclusive: 2006 "JDJ Editors' Choice" Awards
- The i-Technology Right Stuff
- JavaServer Faces (JSF) vs Struts
- Rich Internet Applications with Adobe Flex 2 and Java
- Java vs C++ "Shootout" Revisited
- Bean-Managed Persistence Using a Proxy List
- Reporting Made Easy with JasperReports and Hibernate
- Creating a Pet Store Application with JavaServer Faces, Spring, and Hibernate
- What's New in Eclipse?
































