Welcome!

Java IoT Authors: Pat Romanski, Liz McMillan, Elizabeth White, Yeshim Deniz, Frank Lupo

Related Topics: @BigDataExpo, Microservices Expo, Microsoft Cloud, Containers Expo Blog, @CloudExpo, Apache, SDN Journal

@BigDataExpo: Article

Big Data and Master Data Management

Single Source of Truth in Big Data

Master Data Management (MDM) is a very important data governance aspect in enterprises whereby MDM enables the development of a "Single Version of Truth." MDM establishes Single Version of Truth by providing common descriptions for enterprise-wide entities.

Need for MDM in Big Data Processing
Before Big Data, enterprises generally managed their transaction data in traditional relational databases. One of the biggest strengths of relational databases is their ability to enforce constraints like check constraints, primary key, foreign key, etc., which ensure that the data captured is of the highest quality.

In spite of such support for data integrity, enterprises had duplicates in their master data that resulted in inaccurate results in analytics on that data. For example, an enterprise may target an expensive advertisement campaign for a new product to its existing customers; however, due to the fact that a particular customer may exist with different IDs across multiple systems, the enterprise may be sending its campaign materials to the same person multiple times.

Similarly, a manufacturing enterprise may be analyzing the problem and complaint records from their customers, but a lack of uniformity between the product codes across the regions and a lack of uniformity across problem types may result in inaccurate quantification of the issues.

Enterprises traditionally attack the Master Data Management by implementing following measures.

  • Enables development of a "single version of the truth" by establishing common descriptions for core business entities across multiple systems.
  • Assess current master data maturity across the enterprise, identify target maturity and identify gaps
  • Master Data Management Tool Selection
  • Master data models and cleansed data
  • MDM governance and stewardship
  • MDM Strategy to tackle mergers and acquisitions

With the advent of Big Data processing, enterprises started analyzing massive amounts of unstructured data from unconventional sources, which means the inconsistencies across the data is increasing and the level of validations that are performed at the data capture is very limited when compared to the traditional relational data capture.

For example, if the enterprises wanted to target customers on social media with the potential for one customer represented in multiple social media forums in different names, the chances of the campaigns either overreaching a person or not reaching at all is very high. The same is true when microblogging sites are used to analyze the voice of the customer and categorize complaints across products. There is high possibility that customer misspell the product names or use some local naming conventions for the same products that will prevent an effective analysis.

Master Data Managemet in Big Data
The following are some of the approaches of integrating MDM data quality solutions in Big Data Processing so that the true insights on the massive quantities can be generated and these insights can really be accurate for the enterprises.

  • Adopting Hybrid Big Data Solutions: As highlighted In my last article on Hybrid Big Data Solutions, integrating Big Data with the existing relational data which is likely to contain MDM source data bases is one of the easiest ways to ensure data quality on the big data sources.
  • Matching More Than Keywords: The massive quantities of unstructured data bring together a greater level of ambiguity about classification and relevance of the documents, and hence a mere key word matching of entities to get the subject of interest is not enough. Most of the current examples on Big Data is more about utilizing standard regular expression functions, however the true potential of Big Data in conjunction with MDM can be achieved if Text Analytics is adopted on Big Data more than standard regular expressions.
  • Adopting a Data Virtualization Layer: Data Virtualization platform provides a common hub for capturing data across traditional and big data and hence the business rules can be managed at this layer which will ensure the data quality across disparate data sources.
  • Utilize the Power of Hadoop Database Extensions: Big Data frameworks like Hadoop provide the ability to keep the data in their own file system HDFS without transforming them, and the data can be accessed using SQL Like languages. For example Hive allows to read the data in Hadoop file system using SQL Interface. Similarly HBase is a columnar database implemented on top of HDFS file system. These implementations have support for imposing constraints on the underlying Big Data. For example Hive supports JOIN across tables which will go a long way in checking for integrity with respect to MDM.

Summary
While enterprises continue to adopt Big Data as part of their data management the biggest challenge will be the data quality. The RDBMS have done a great job on the data integrity and Big Data should support the same. Implementing the traditional Master Data Management on top of Big Data / Unified Data will go a long way in providing the correct insights from the Big Data processing.

More Stories By Srinivasan Sundara Rajan

Highly passionate about utilizing Digital Technologies to enable next generation enterprise. Believes in enterprise transformation through the Natives (Cloud Native & Mobile Native).

@ThingsExpo Stories
SYS-CON Events announced today that TidalScale will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. TidalScale is the leading provider of Software-Defined Servers that bring flexibility to modern data centers by right-sizing servers on the fly to fit any data set or workload. TidalScale’s award-winning inverse hypervisor technology combines multiple commodity servers (including their ass...
As hybrid cloud becomes the de-facto standard mode of operation for most enterprises, new challenges arise on how to efficiently and economically share data across environments. In his session at 21st Cloud Expo, Dr. Allon Cohen, VP of Product at Elastifile, will explore new techniques and best practices that help enterprise IT benefit from the advantages of hybrid cloud environments by enabling data availability for both legacy enterprise and cloud-native mission critical applications. By rev...
Join IBM November 1 at 21st Cloud Expo at the Santa Clara Convention Center in Santa Clara, CA, and learn how IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Cognitive analysis impacts today’s systems with unparalleled ability that were previously available only to manned, back-end operations. Thanks to cloud processing, IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Imagine a robot vacuum that becomes your personal assistant tha...
As popularity of the smart home is growing and continues to go mainstream, technological factors play a greater role. The IoT protocol houses the interoperability battery consumption, security, and configuration of a smart home device, and it can be difficult for companies to choose the right kind for their product. For both DIY and professionally installed smart homes, developers need to consider each of these elements for their product to be successful in the market and current smart homes.
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, will lead you through the exciting evolution of the cloud. He'll look at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering ...
SYS-CON Events announced today that N3N will exhibit at SYS-CON's @ThingsExpo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. N3N’s solutions increase the effectiveness of operations and control centers, increase the value of IoT investments, and facilitate real-time operational decision making. N3N enables operations teams with a four dimensional digital “big board” that consolidates real-time live video feeds alongside IoT sensor data a...
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It’s clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. Tha...
SYS-CON Events announced today that Avere Systems, a leading provider of enterprise storage for the hybrid cloud, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Avere delivers a more modern architectural approach to storage that doesn't require the overprovisioning of storage capacity to achieve performance, overspending on expensive storage media for inactive data or the overbui...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
SYS-CON Events announced today that mruby Forum will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. mruby is the lightweight implementation of the Ruby language. We introduce mruby and the mruby IoT framework that enhances development productivity. For more information, visit http://forum.mruby.org/.
Digital transformation is changing the face of business. The IDC predicts that enterprises will commit to a massive new scale of digital transformation, to stake out leadership positions in the "digital transformation economy." Accordingly, attendees at the upcoming Cloud Expo | @ThingsExpo at the Santa Clara Convention Center in Santa Clara, CA, Oct 31-Nov 2, will find fresh new content in a new track called Enterprise Cloud & Digital Transformation.
SYS-CON Events announced today that NetApp has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. NetApp is the data authority for hybrid cloud. NetApp provides a full range of hybrid cloud data services that simplify management of applications and data across cloud and on-premises environments to accelerate digital transformation. Together with their partners, NetApp emp...
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
Amazon is pursuing new markets and disrupting industries at an incredible pace. Almost every industry seems to be in its crosshairs. Companies and industries that once thought they were safe are now worried about being “Amazoned.”. The new watch word should be “Be afraid. Be very afraid.” In his session 21st Cloud Expo, Chris Kocher, a co-founder of Grey Heron, will address questions such as: What new areas is Amazon disrupting? How are they doing this? Where are they likely to go? What are th...
Most technology leaders, contemporary and from the hardware era, are reshaping their businesses to do software. They hope to capture value from emerging technologies such as IoT, SDN, and AI. Ultimately, irrespective of the vertical, it is about deriving value from independent software applications participating in an ecosystem as one comprehensive solution. In his session at @ThingsExpo, Kausik Sridhar, founder and CTO of Pulzze Systems, will discuss how given the magnitude of today's applicati...
Join IBM November 1 at 21st Cloud Expo at the Santa Clara Convention Center in Santa Clara, CA, and learn how IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Cognitive analysis impacts today’s systems with unparalleled ability that were previously available only to manned, back-end operations. Thanks to cloud processing, IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Imagine a robot vacuum that becomes your personal assistant th...
SYS-CON Events announced today that SkyScale will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. SkyScale is a world-class provider of cloud-based, ultra-fast multi-GPU hardware platforms for lease to customers desiring the fastest performance available as a service anywhere in the world. SkyScale builds, configures, and manages dedicated systems strategically located in maximum-security...
SYS-CON Events announced today that Avere Systems, a leading provider of hybrid cloud enablement solutions, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Avere Systems was created by file systems experts determined to reinvent storage by changing the way enterprises thought about and bought storage resources. With decades of experience behind the company’s founders, Avere got its ...
SYS-CON Events announced today that Taica will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. ANSeeN are the measurement electronics maker for X-ray and Gamma-ray and Neutron measurement equipment such as spectrometers, pulse shape analyzer, and CdTe-FPD. For more information, visit http://anseen.com/.
High-velocity engineering teams are applying not only continuous delivery processes, but also lessons in experimentation from established leaders like Amazon, Netflix, and Facebook. These companies have made experimentation a foundation for their release processes, allowing them to try out major feature releases and redesigns within smaller groups before making them broadly available. In his session at 21st Cloud Expo, Brian Lucas, Senior Staff Engineer at Optimizely, will discuss how by using...