Welcome!

Java IoT Authors: Elizabeth White, Pat Romanski, Yeshim Deniz, Liz McMillan, Stackify Blog

Related Topics: @DXWorldExpo, Microservices Expo, Microsoft Cloud, Containers Expo Blog, @CloudExpo, Apache, SDN Journal

@DXWorldExpo: Article

Big Data and Master Data Management

Single Source of Truth in Big Data

Master Data Management (MDM) is a very important data governance aspect in enterprises whereby MDM enables the development of a "Single Version of Truth." MDM establishes Single Version of Truth by providing common descriptions for enterprise-wide entities.

Need for MDM in Big Data Processing
Before Big Data, enterprises generally managed their transaction data in traditional relational databases. One of the biggest strengths of relational databases is their ability to enforce constraints like check constraints, primary key, foreign key, etc., which ensure that the data captured is of the highest quality.

In spite of such support for data integrity, enterprises had duplicates in their master data that resulted in inaccurate results in analytics on that data. For example, an enterprise may target an expensive advertisement campaign for a new product to its existing customers; however, due to the fact that a particular customer may exist with different IDs across multiple systems, the enterprise may be sending its campaign materials to the same person multiple times.

Similarly, a manufacturing enterprise may be analyzing the problem and complaint records from their customers, but a lack of uniformity between the product codes across the regions and a lack of uniformity across problem types may result in inaccurate quantification of the issues.

Enterprises traditionally attack the Master Data Management by implementing following measures.

  • Enables development of a "single version of the truth" by establishing common descriptions for core business entities across multiple systems.
  • Assess current master data maturity across the enterprise, identify target maturity and identify gaps
  • Master Data Management Tool Selection
  • Master data models and cleansed data
  • MDM governance and stewardship
  • MDM Strategy to tackle mergers and acquisitions

With the advent of Big Data processing, enterprises started analyzing massive amounts of unstructured data from unconventional sources, which means the inconsistencies across the data is increasing and the level of validations that are performed at the data capture is very limited when compared to the traditional relational data capture.

For example, if the enterprises wanted to target customers on social media with the potential for one customer represented in multiple social media forums in different names, the chances of the campaigns either overreaching a person or not reaching at all is very high. The same is true when microblogging sites are used to analyze the voice of the customer and categorize complaints across products. There is high possibility that customer misspell the product names or use some local naming conventions for the same products that will prevent an effective analysis.

Master Data Managemet in Big Data
The following are some of the approaches of integrating MDM data quality solutions in Big Data Processing so that the true insights on the massive quantities can be generated and these insights can really be accurate for the enterprises.

  • Adopting Hybrid Big Data Solutions: As highlighted In my last article on Hybrid Big Data Solutions, integrating Big Data with the existing relational data which is likely to contain MDM source data bases is one of the easiest ways to ensure data quality on the big data sources.
  • Matching More Than Keywords: The massive quantities of unstructured data bring together a greater level of ambiguity about classification and relevance of the documents, and hence a mere key word matching of entities to get the subject of interest is not enough. Most of the current examples on Big Data is more about utilizing standard regular expression functions, however the true potential of Big Data in conjunction with MDM can be achieved if Text Analytics is adopted on Big Data more than standard regular expressions.
  • Adopting a Data Virtualization Layer: Data Virtualization platform provides a common hub for capturing data across traditional and big data and hence the business rules can be managed at this layer which will ensure the data quality across disparate data sources.
  • Utilize the Power of Hadoop Database Extensions: Big Data frameworks like Hadoop provide the ability to keep the data in their own file system HDFS without transforming them, and the data can be accessed using SQL Like languages. For example Hive allows to read the data in Hadoop file system using SQL Interface. Similarly HBase is a columnar database implemented on top of HDFS file system. These implementations have support for imposing constraints on the underlying Big Data. For example Hive supports JOIN across tables which will go a long way in checking for integrity with respect to MDM.

Summary
While enterprises continue to adopt Big Data as part of their data management the biggest challenge will be the data quality. The RDBMS have done a great job on the data integrity and Big Data should support the same. Implementing the traditional Master Data Management on top of Big Data / Unified Data will go a long way in providing the correct insights from the Big Data processing.

More Stories By Srinivasan Sundara Rajan

Highly passionate about utilizing Digital Technologies to enable next generation enterprise. Believes in enterprise transformation through the Natives (Cloud Native & Mobile Native).

@ThingsExpo Stories
No hype cycles or predictions of zillions of things here. IoT is big. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, Associate Partner at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He discussed the evaluation of communication standards and IoT messaging protocols, data analytics considerations, edge-to-cloud tec...
Announcing Poland #DigitalTransformation Pavilion
Digital Transformation is much more than a buzzword. The radical shift to digital mechanisms for almost every process is evident across all industries and verticals. This is often especially true in financial services, where the legacy environment is many times unable to keep up with the rapidly shifting demands of the consumer. The constant pressure to provide complete, omnichannel delivery of customer-facing solutions to meet both regulatory and customer demands is putting enormous pressure on...
DXWorldEXPO LLC announced today that All in Mobile, a mobile app development company from Poland, will exhibit at the 22nd International CloudEXPO | DXWorldEXPO. All In Mobile is a mobile app development company from Poland. Since 2014, they maintain passion for developing mobile applications for enterprises and startups worldwide.
CloudEXPO | DXWorldEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
The best way to leverage your CloudEXPO | DXWorldEXPO presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering CloudEXPO | DXWorldEXPO will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at CloudEXPO. Product announcements during our show provide your company with the most reach through our targeted audienc...
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world.
Everything run by electricity will eventually be connected to the Internet. Get ahead of the Internet of Things revolution. In his session at @ThingsExpo, Akvelon expert and IoT industry leader Sergey Grebnov provided an educational dive into the world of managing your home, workplace and all the devices they contain with the power of machine-based AI and intelligent Bot services for a completely streamlined experience.
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
In his keynote at 19th Cloud Expo, Sheng Liang, co-founder and CEO of Rancher Labs, discussed the technological advances and new business opportunities created by the rapid adoption of containers. With the success of Amazon Web Services (AWS) and various open source technologies used to build private clouds, cloud computing has become an essential component of IT strategy. However, users continue to face challenges in implementing clouds, as older technologies evolve and newer ones like Docker c...
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
JETRO showcased Japan Digital Transformation Pavilion at SYS-CON's 21st International Cloud Expo® at the Santa Clara Convention Center in Santa Clara, CA. The Japan External Trade Organization (JETRO) is a non-profit organization that provides business support services to companies expanding to Japan. With the support of JETRO's dedicated staff, clients can incorporate their business; receive visa, immigration, and HR support; find dedicated office space; identify local government subsidies; get...
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
In past @ThingsExpo presentations, Joseph di Paolantonio has explored how various Internet of Things (IoT) and data management and analytics (DMA) solution spaces will come together as sensor analytics ecosystems. This year, in his session at @ThingsExpo, Joseph di Paolantonio from DataArchon, added the numerous Transportation areas, from autonomous vehicles to “Uber for containers.” While IoT data in any one area of Transportation will have a huge impact in that area, combining sensor analytic...
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
Charles Araujo is an industry analyst, internationally recognized authority on the Digital Enterprise and author of The Quantum Age of IT: Why Everything You Know About IT is About to Change. As Principal Analyst with Intellyx, he writes, speaks and advises organizations on how to navigate through this time of disruption. He is also the founder of The Institute for Digital Transformation and a sought after keynote speaker. He has been a regular contributor to both InformationWeek and CIO Insight...
Michael Maximilien, better known as max or Dr. Max, is a computer scientist with IBM. At IBM Research Triangle Park, he was a principal engineer for the worldwide industry point-of-sale standard: JavaPOS. At IBM Research, some highlights include pioneering research on semantic Web services, mashups, and cloud computing, and platform-as-a-service. He joined the IBM Cloud Labs in 2014 and works closely with Pivotal Inc., to help make the Cloud Found the best PaaS.
It is of utmost importance for the future success of WebRTC to ensure that interoperability is operational between web browsers and any WebRTC-compliant client. To be guaranteed as operational and effective, interoperability must be tested extensively by establishing WebRTC data and media connections between different web browsers running on different devices and operating systems. In his session at WebRTC Summit at @ThingsExpo, Dr. Alex Gouaillard, CEO and Founder of CoSMo Software, presented ...