|
|
YOUR FEEDBACK
Did you read today's front page stories & breaking news?
SOA World Conference
Virtualization Conference $200 Savings Expire May 16, 2008... – Register Today!
SYS-CON.TV |
TOP THREE LINKS YOU MUST CLICK ON General Java
Which EII Solution Is Right for You?
Managing integration tasks
By: Peter Chang
Digg This!
Enterprise Information Integration (EII) represents a new category of software that enables disparate data silos to be integrated into a single virtual database for applications. This approach gives developers a powerful tool for simplifying data integration and building flexible applications. If you haven't heard of EII yet, you will soon as the industry rallies around this concept and more EII projects reach deployment. This article explains why you should consider EII, describes its advantages for data integration, and discusses the different approaches to implementing EII. The article provides a framework for comparing EII products and choosing a suitable product to begin simplifying data integration in your environment. Data Integration Issues To combat the inflexibility induced by information fragmentation, developers have many tools for data integration (see Figure 1). There are adapters for accessing data sources, transformation engines for reformatting data, and data warehouses for aggregating data from multiple sources. To integrate data sources, developers use these tools to program the integration requirements into applications.
![]() Although this approach to data integration works, it requires a programmatic approach that has the following deficiencies:
A better solution for data integration is EII. EII supports data integration by enabling multiple data silos to be represented to applications as a single virtual database. Instead of integrating data in the application code, the data integration function is pushed out of the application layer into a new EII tier that sits between applications and data sources. This new tier is dedicated to managing integration tasks such as connecting to a data source, transforming data, and integrating data. In this framework, the developer or analyst creates a logical data model in the EII server that represents the business view of information (aka the data integration requirements). The target data source's physical data models are mapped into the logical data model to create a virtual database schema. Applications interact with data sources through the EII server based on the logical data model. The EII server automatically translates application requests into queries against one or more data sources, integrates the data, and produces results according to the logical view. EII's holistic approach to data integration addresses the drawback of the toolkit approach (see Figure 2). Instead of providing tools for each task, EII establishes a framework that automates the low-level details and exposes a high-level, declarative interface for specifying data integration requirements. Multiple data sources can be integrated without writing any application code. The developer defines the desired logical data model and maps the data source into the logical model using a GUI tool.
![]() This end-to-end approach generates several advantages:
How do you get started with EII? Like all emerging markets, EII offers many different approaches to implementing the solution. In general, the available products can be differentiated based on the underlying logical data model, the data transformation framework, and the query interface. Logic Data Model Transformation Framework Query Interface Although the data model, transformation framework, and query interface can be mixed and matched, in practice, specific transformation frameworks and specific query interfaces work best with specific data models. Therefore, the general approaches to EII can be grouped into three main categories: relational, object, and XML (see Table 1).
![]() Relational Approach MetaMatrix is an example of this approach. Object Approach Journée is an example of this approach. XML Approach BEA Liquid Data for WebLogic, Ipedo, and Nimble are examples of this approach. Each approach to EII has its advantages and disadvantages. The primary distinctions are based on the data modeling flexibility, the query flexibility, and the result-processing requirements. These differences affect the suitability of each approach to specific applications and developer predilections. Data Modeling Flexibility
![]() In this regard, the object and XML approach have an advantage over the relational approach because of their ability to support hierarchical data relationships. Their logical data model can directly represent hierarchical data whereas the relational approach must decompose the data structure into tables. In addition, the object and XML approach can represent hierarchical data sources like XML, whereas the relational approach requires the developer to reconstruct the hierarchical data in application code. This modeling advantage is important to applications that use data from nonrelational data-sources sources such as message queues, Web services, XML documents, EJBs, and applications. Query Flexibility The SQL and XQuery query languages have an advantage over simply retrieving objects by criteria. The language approach better supports projection, aggregation, and joins, which enable fine-grained results to be produced by a query. In contrast, query returns of object collections may generate unnecessary data and require additional processing in application code to aggregate and join data. XQuery has a further advantage over SQL and Object with its enhanced data manipulation facilities. It provides a functional programming language that can express complex transformations against any data structure. It supports built-in and external functions, conditional processing, scripting, and the ability to transform results into any text or binary format. The distinction in query flexibility is important because the more data processing the EII server can perform, the less code the developer needs to create and maintain. Query flexibility also has implications on performance. A query that returns just the desired data, minimizes network traffic and improves the response time. To achieve the smallest result set, the query must maximize the level of data selection, projection, aggregation, joins, and transformation that can be expressed and processed in a single query. Result Processing The object approach has an advantage over the relational and XML approach since a native object representation of data is the most convenient to work with. The developer directly accesses the data by simply calling the specific data object's methods. In the relational and XML approach, the developer must work with generic data structures such as JDBC ResultSets or DOM objects. These require additional work over the object approach to read, update, create, and delete data. Selecting an Approach The relational approach works very well with established data programming practices. Developers can get started quickly and leverage traditional techniques and know-how. Although the relational data model is less flexible, the majority of enterprise data resides in relational databases and many nonrelational data sources can be coaxed into a relational format. The pain of accommodating a few nonrelational data sources may be outweighed by the approachability of relational development. The object approach provides a flexible, logical data model for integrating diverse data sources. For complex environments, this capability greatly simplifies data integration and produces detailed views of enterprise data assets. In addition, the object interface makes data programming more convenient. Fans of object-to-relational mapping tools and object databases will appreciate the object approach. Unfortunately, binding data to program objects is also the weakness of the object approach. This makes query processing less flexible and requires the application developer to process more data. The resulting inefficiency makes the object approach unsuitable for ad hoc query processing where the data binding can't be tuned for performance. The XML approach is the most cutting-edge architecture for data integration. Using XML as the logical data model and XQuery as the query interface provides a flexible platform for EII. XML effectively models many enterprise data sources and XQuery provides powerful data processing capabilities. Together these minimize the integration code in applications. These advantages are the most apparent in an environment with heterogeneous data sources and mixed application architectures. Although the XML approach provides a great deal of flexibility, the technology is less mature and requires a shift toward XML-centric application development. For developers ready to experiment and blaze new trails, the payoff will be worthwhile. Other Considerations Adapters If your target data source is not supported, most vendors offer a development kit for building custom adapters. The effort will vary but it won't be trivial, so consider this option carefully. Update Security Caching There are two kinds of caches. The more conventional is a read cache that saves prior query results. The EII server scans the cache for past results identical to the query before going to back-end data sources and building new results. To avoid stale data, a read cache periodically clears the cached results. Alternatively, the read cache can also be synchronized with the source, which enables cached results to be selectively cleared or refreshed. This keeps the cache hit rate high to reduce query response times. The other type of cache is a query cache. This approach caches all the data in the logical data format and directly processes arbitrary queries against the cached data. A query cache is like a data mart except with active synchronization. This approach eliminates all the physical-to-logical data mapping and all queries against the underlying data sources during a request. For sophisticated logical data models, a query cache can greatly improve performance. However, a query cache's storage and synchronization requirements can be enormous. Careful data partitioning is critical to keep the cache size manageable and filled with sufficiently current data. A read cache and a query cache can be used in combination to better match data usage requirements. A read cache is best for situations with predictable and repetitive queries, while a query cache is best for environments with ad hoc queries against relatively static data sources. Summary A starting point for evaluating the suitability of EII products is to consider the three general approaches to EII: relational, object, and XML. Each approach has its advantages and disadvantages, so the best solution depends on the specific requirements. In general:
LATEST JAVA STORIES & POSTS
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
|
SYS-CON FEATURED WHITEPAPERS MOST READ THIS WEEK BREAKING JAVA NEWS
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||