Welcome!

Java IoT Authors: Elizabeth White, Pat Romanski, Yeshim Deniz, Liz McMillan, Stackify Blog

Related Topics: @DXWorldExpo, Java IoT, @CloudExpo

@DXWorldExpo: Article

Big Data Analytics By @TheEbizWizard | @CloudExpo #BigData

Had Mark Twain lived today, we might hear him utter the oath lies, damn lies, and analytics

Big Data Analytics Raises the Bar for Data Preparation

Had Mark Twain lived today, we might hear him utter the oath lies, damn lies, and analytics. Statistics to be sure may still be used to distort the truth – but now with the sudden explosion of big data, analytics threaten the same fate.

I’m not talking about intentional distortion here – that’s another story entirely. Rather, the risk of unintentional distortion via data analytics is becoming increasingly prevalent, as the sheer quantity of data increases, as well as the availability and usability of the analytics tools on the market.

The data scientists themselves aren’t the problem. In fact, the more qualified data scientists we have, the better. But there aren’t enough of these rare professionals to go around.

Furthermore, the ease of use and availability of increasingly mature analytics and other business intelligence (BI) tools are opening up the world of “hands on” analytics to an increasingly broad business audience – few of whom have any particular training in data science.

Are today’s BI tools to blame for this problem? Not really – after all, the tools are unquestionably getting better and better. The root of the problem is data preparation.

After all, the smartest analytics tool in the world can only do so much with poorly organized, incomplete, or incorrect input data – the proverbial garbage-in, garbage-out problem, now compounded by the diversity of data types, levels of structure, and overall context challenges that today’s big data represent.

’Twas not always thus. Back in the good old first-generation data warehouse days, data preparation tasks were more straightforward, and the people responsible for tackling these activities did so for a living.

Now, data preparation is more diverse and challenging, and we’re asking data laypeople to do their best to shoehorn big data sets as best they can into their newfangled analytics tools. No wonder the end result can be such a mess.

A Closer Look at Data Preparation
Integrating multiple data sources, either by physically moving them or via data virtualization, typically involves data preparation. Traditional preparation tasks often include:

  • Bringing basic metadata like column names and numeric value types into a consistent state, for example, by renaming columns or changing all numbers into the same kind of integer.
  • Rudimentary data transformations, for example, taking a field that contains people’s full names and splitting them into first name and last name fields.
  • Making sure missing values are handled consistently. Is a missing value the same as an empty string, or perhaps the dreaded NULL?
  • Routine aggregation tasks, like counting all the records in a particular ZIP Code and entering the total into a separate field.

So far so good – while a data expert will have no problems with these tasks, many an Excel-savvy business analyst can tackle them without distorting results as well.

However, when big data enter the picture, data preparation becomes more complicated, as the variety of data structure and the volume of information increase. Additional data preparation activities may now include:

  • Data wrangling – the manual conversion of data from one raw form to another, especially when the data aren’t in a tabular format. What do you do if your source data contain, say, video files, Word documents, and Twitter streams, all mixed together?
  • Semantic processing – extracting entities from textual data, for example, identifying people and place names. Semantic processing may also include the resolution of ambiguities, for example, recognizing whether “Paris Hilton” is a socialite or a hotel.
  • Mathematical processing – yes, even statistics may be useful here. There are numerous mathematical approaches for identifying clusters or other patterns in information that will help with further analysis.

It’s important to note that the challenge with these more advanced data preparation techniques isn’t simply that inexperienced people won’t be able to perform them. The worry is that they will think they are properly preparing the data, when it fact they are doing it wrong. The end result will hopefully be obviously incorrect, but an even more dangerous scenario is when the final analysis seems correct but in reality is not.

Addressing Data Preparation Challenges
A common knee-jerk reaction to the scenarios described above is simply to establish rules to prevent unqualified users from monkeying with data preparation tasks in the first place. However, such draconian data governance measures typically have no place in a modern data-centric business environment.

The better approach is to provide additional data preparation and data integration tooling that data professionals may configure, but business analysts and other business users may use to prepare data for themselves. In other words, establish a governed, self-service model for data preparation.

For example, data professionals can preconfigure the reusable Snaps from SnapLogic so that they can handle the messier details of data preparation, as well as data access and other transformation tasks. The broader audience of users can then assemble data pipelines simply by snapping together the Snaps. See the illustration below for a SnapLogic pipeline that these “citizen integrators” can create to combine data.

snaplogic2

SnapLogic Pipeline (source: SnapLogic)

It’s also possible to create nested sub-pipelines, so that business users assemble pre-assembled and preconfigured sub-pipelines as well as Snaps into larger data integration pipelines. Such pipelines can be made up of many levels of nested sub-pipelines, and SnapLogic can guarantee the delivery of data from each sub-pipeline (much the same as traditional queues offer guaranteed delivery, extended to many other types of Snaps).

SnapLogic also offers a sub-pipeline review, allowing both experts and business users to see the processing steps within each sub-pipeline, as well as relevant data governance capabilities that support this self-service data preparation approach. For example, it offers a lifecycle management feature that allows for the comparison and testing of Snaps and sub-pipelines before business users get their hands on them.

The Intellyx Take
In the case of SnapLogic, it falls to the data integration layer to resolve the challenges with data preparation. In truth, SnapLogic is essentially a data integration tooling vendor – but there is an important lesson here: data preparation is in reality an aspect of data integration, and in fact, data governance is part of the data integration story as well.

As enterprises leverage big data across their organizations, it becomes increasingly important to support the full breadth of personnel who will be working with such information, in order to get the best results from the resulting analysis. Leveraging data preparation capabilities like those found in SnapLogic’s pipelines is a critical enabler of useful, accurate data analysis.

SnapLogic is an Intellyx client, but Intellyx retains full editorial control of this article.

More Stories By Jason Bloomberg

Jason Bloomberg is a leading IT industry analyst, Forbes contributor, keynote speaker, and globally recognized expert on multiple disruptive trends in enterprise technology and digital transformation. He is ranked #5 on Onalytica’s list of top Digital Transformation influencers for 2018 and #15 on Jax’s list of top DevOps influencers for 2017, the only person to appear on both lists.

As founder and president of Agile Digital Transformation analyst firm Intellyx, he advises, writes, and speaks on a diverse set of topics, including digital transformation, artificial intelligence, cloud computing, devops, big data/analytics, cybersecurity, blockchain/bitcoin/cryptocurrency, no-code/low-code platforms and tools, organizational transformation, internet of things, enterprise architecture, SD-WAN/SDX, mainframes, hybrid IT, and legacy transformation, among other topics.

Mr. Bloomberg’s articles in Forbes are often viewed by more than 100,000 readers. During his career, he has published over 1,200 articles (over 200 for Forbes alone), spoken at over 400 conferences and webinars, and he has been quoted in the press and blogosphere over 2,000 times.

Mr. Bloomberg is the author or coauthor of four books: The Agile Architecture Revolution (Wiley, 2013), Service Orient or Be Doomed! How Service Orientation Will Change Your Business (Wiley, 2006), XML and Web Services Unleashed (SAMS Publishing, 2002), and Web Page Scripting Techniques (Hayden Books, 1996). His next book, Agile Digital Transformation, is due within the next year.

At SOA-focused industry analyst firm ZapThink from 2001 to 2013, Mr. Bloomberg created and delivered the Licensed ZapThink Architect (LZA) Service-Oriented Architecture (SOA) course and associated credential, certifying over 1,700 professionals worldwide. He is one of the original Managing Partners of ZapThink LLC, which was acquired by Dovel Technologies in 2011.

Prior to ZapThink, Mr. Bloomberg built a diverse background in eBusiness technology management and industry analysis, including serving as a senior analyst in IDC’s eBusiness Advisory group, as well as holding eBusiness management positions at USWeb/CKS (later marchFIRST) and WaveBend Solutions (now Hitachi Consulting), and several software and web development positions.

@ThingsExpo Stories
Announcing Poland #DigitalTransformation Pavilion
Digital Transformation is much more than a buzzword. The radical shift to digital mechanisms for almost every process is evident across all industries and verticals. This is often especially true in financial services, where the legacy environment is many times unable to keep up with the rapidly shifting demands of the consumer. The constant pressure to provide complete, omnichannel delivery of customer-facing solutions to meet both regulatory and customer demands is putting enormous pressure on...
DXWorldEXPO LLC announced today that All in Mobile, a mobile app development company from Poland, will exhibit at the 22nd International CloudEXPO | DXWorldEXPO. All In Mobile is a mobile app development company from Poland. Since 2014, they maintain passion for developing mobile applications for enterprises and startups worldwide.
CloudEXPO | DXWorldEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
The best way to leverage your CloudEXPO | DXWorldEXPO presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering CloudEXPO | DXWorldEXPO will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at CloudEXPO. Product announcements during our show provide your company with the most reach through our targeted audienc...
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world.
Everything run by electricity will eventually be connected to the Internet. Get ahead of the Internet of Things revolution. In his session at @ThingsExpo, Akvelon expert and IoT industry leader Sergey Grebnov provided an educational dive into the world of managing your home, workplace and all the devices they contain with the power of machine-based AI and intelligent Bot services for a completely streamlined experience.
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
In his keynote at 19th Cloud Expo, Sheng Liang, co-founder and CEO of Rancher Labs, discussed the technological advances and new business opportunities created by the rapid adoption of containers. With the success of Amazon Web Services (AWS) and various open source technologies used to build private clouds, cloud computing has become an essential component of IT strategy. However, users continue to face challenges in implementing clouds, as older technologies evolve and newer ones like Docker c...
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
JETRO showcased Japan Digital Transformation Pavilion at SYS-CON's 21st International Cloud Expo® at the Santa Clara Convention Center in Santa Clara, CA. The Japan External Trade Organization (JETRO) is a non-profit organization that provides business support services to companies expanding to Japan. With the support of JETRO's dedicated staff, clients can incorporate their business; receive visa, immigration, and HR support; find dedicated office space; identify local government subsidies; get...
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
In past @ThingsExpo presentations, Joseph di Paolantonio has explored how various Internet of Things (IoT) and data management and analytics (DMA) solution spaces will come together as sensor analytics ecosystems. This year, in his session at @ThingsExpo, Joseph di Paolantonio from DataArchon, added the numerous Transportation areas, from autonomous vehicles to “Uber for containers.” While IoT data in any one area of Transportation will have a huge impact in that area, combining sensor analytic...
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
Charles Araujo is an industry analyst, internationally recognized authority on the Digital Enterprise and author of The Quantum Age of IT: Why Everything You Know About IT is About to Change. As Principal Analyst with Intellyx, he writes, speaks and advises organizations on how to navigate through this time of disruption. He is also the founder of The Institute for Digital Transformation and a sought after keynote speaker. He has been a regular contributor to both InformationWeek and CIO Insight...
Michael Maximilien, better known as max or Dr. Max, is a computer scientist with IBM. At IBM Research Triangle Park, he was a principal engineer for the worldwide industry point-of-sale standard: JavaPOS. At IBM Research, some highlights include pioneering research on semantic Web services, mashups, and cloud computing, and platform-as-a-service. He joined the IBM Cloud Labs in 2014 and works closely with Pivotal Inc., to help make the Cloud Found the best PaaS.
It is of utmost importance for the future success of WebRTC to ensure that interoperability is operational between web browsers and any WebRTC-compliant client. To be guaranteed as operational and effective, interoperability must be tested extensively by establishing WebRTC data and media connections between different web browsers running on different devices and operating systems. In his session at WebRTC Summit at @ThingsExpo, Dr. Alex Gouaillard, CEO and Founder of CoSMo Software, presented ...
I think DevOps is now a rambunctious teenager - it's starting to get a mind of its own, wanting to get its own things but it still needs some adult supervision," explained Thomas Hooker, VP of marketing at CollabNet, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.