Welcome!

Java IoT Authors: Liz McMillan, Elizabeth White, Pat Romanski, Yeshim Deniz, Frank Lupo

Related Topics: Microservices Expo, Java IoT, Containers Expo Blog, Agile Computing, @CloudExpo, Apache

Microservices Expo: Article

Big Data Analytics: Thinking Outside of Hadoop

UIMA and Hadoop in Big Data Analytics

Big Data Predictions
In the recent release of '2012 Hype Cycle Of Emerging Technologies,' research analyst Gartner evaluated several technologies to come up with a list of technologies that will dominate the future . "Big Data" related technologies form a significant portion of the list, in particular the following technologies revolve around the concept and usage of Big Data.

  • Social Analytics: This analytics allow marketers to identify sentiment and identify trends in order to accommodate the customer better.
  • Activity Streams: Activity Streams are the future of enterprise collaboration, uniting people, data, and applications in real-time in a central, accessible, virtual interface. Think of a company social network where every employee, system, and business process exchanged up-to-the-minute information about their activities and outcomes
  • Natural Language Question Answering: NLP is a technique for analysing naturally occurring texts as part of linguistic processing to achieve human like language processing.
  • Video Analytics: forming a conceptual and contextual understanding of the content of the video .
  • Context Enriched Services: Context-enriched services use information about a person or an object to proactively anticipate the user's need and serve up the content.

These areas are just representative but in general many of the emerging technologies revolve around the ability to process large amounts of data from hither to unconventional sources and extract meaning out of them.

Doing a general search on Big Data on Google, or any other technical forum, we find that Big Data is almost synonymous with Hadoop, the main reason being that  the storage and computational operations in Hadoop are parallel by design. Hadoop is highly scalable. Data can be accessed and operated upon without interdependencies until the results need to be reduced - and even the reduction itself can be performed in parallel. The result is that large amounts of data can be processed concurrently by many servers, greatly reducing the time to obtain results.

MapReduce is the basic component of Hadoop. It is a parallel programming framework that processes unordered data. All data are composed of a key and an associated value and processing occurs mainly in two phases, the Map phase and the Reduce phase.

However, careful analysis of the above technologies like Social Analyics, Video Analytics, and NLP, reveal that all of them definitely needed the massively parallel processing power of Hadoop, but they also need the level of intelligence that extracts meaningful information such that the Hadoop's Map & Reduce functions are effective to provide the required insight.

UIMA and Big Data Analytics
Hadoop in its raw form will not solve all the insight required for Big Data processing like in the technologies mentioned earlier. This is evident from the fact that most of the tutorials and examples on Hadoop Big Data Processing are about counting words or parsing text streams for specific words, etc. Naturally these examples are good from an academic perspective but may not solve the real life needs of Big Data processing and the associated insight needed for the enterprises.

UIMA stands for Unstructured Information Management Architecture. It is a component software architecture for the development, discovery, composition and deployment of multi-modal analytics for the analysis of unstructured information and its integration with search and knowledge management technologies.

The UIMA architecture supports the development, discovery, composition and deployment of multi-modal analytics, including text, audio and video.

Once the initial processors understands Video, Audio and other media documents like email and creates textual meaning out of it like the string of tokens or other patterns, these values can be parsed by various annotators of UIMA pipe line.

The parser subcomponent is responsible for converting the crawled document in its native format .

UIMA is an architecture in which basic building blocks called Analysis Engines (AEs) are composed to analyze a document and infer and record descriptive attributes about the document as a whole, and/or about regions therein. This descriptive information, produced by AEs is referred to generally as analysis results. Analysis results typically represent meta-data about the document content. One way to think about AEs is as software agents that automatically discover and record meta-data about original content.

Analysis Engines are constructed from building blocks called Annotators. An annotator is a component that contains analysis logic. Annotators analyze an artifact (for example, a text document) and create additional data (metadata) about that artifact. It is a goal of UIMA that annotators need not be concerned with anything other than their analysis logic - for example the details of their deployment or their interaction with other annotators.

An Analysis Engine (AE) may contain a single annotator (this is referred to as a Primitive AE), or it may be a composition of others and therefore contain multiple annotators (this is referred to as an Aggregate AE).

Some of the examples of Annotators could be :

  • Language Identification annotator
  • Linguistic Analysis annotator
  • Dictionary Lookup annotator
  • Named Entity Recognition annotator
  • Pattern Matcher annotator
  • Classification Module annotator
  • Custom annotators

While a detailed description of Annotators in processing unstructured data is beyond the scope of this article, you can appreciate the power of Annotator with one specific example below.

The OpenCalais Annotator component wraps the OpenCalais web service and makes the OpenCalais analysis results available in UIMA. OpenCalais can detect a large variety of entities, facts and events like for example Persons, Companies, Acquisitions, Mergers, etc.

Summary
As is evident from the above facts, frameworks like UIMA extend the Big Data processing towards much more meaningful insights and map them to real world scenarios. While the Massively Parallel Processing abilities of Hadoop will be a key factor in Big Data initiatives, it is not alone enough and frameworks like UIMA will be playing a much larger part.

More Stories By Srinivasan Sundara Rajan

Highly passionate about utilizing Digital Technologies to enable next generation enterprise. Believes in enterprise transformation through the Natives (Cloud Native & Mobile Native).

@ThingsExpo Stories
As businesses evolve, they need technology that is simple to help them succeed today and flexible enough to help them build for tomorrow. Chrome is fit for the workplace of the future — providing a secure, consistent user experience across a range of devices that can be used anywhere. In her session at 21st Cloud Expo, Vidya Nagarajan, a Senior Product Manager at Google, will take a look at various options as to how ChromeOS can be leveraged to interact with people on the devices, and formats th...
SYS-CON Events announced today that Yuasa System will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Yuasa System is introducing a multi-purpose endurance testing system for flexible displays, OLED devices, flexible substrates, flat cables, and films in smartphones, wearables, automobiles, and healthcare.
SYS-CON Events announced today that Taica will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Taica manufacturers Alpha-GEL brand silicone components and materials, which maintain outstanding performance over a wide temperature range -40C to +200C. For more information, visit http://www.taica.co.jp/english/.
SYS-CON Events announced today that SourceForge has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. SourceForge is the largest, most trusted destination for Open Source Software development, collaboration, discovery and download on the web serving over 32 million viewers, 150 million downloads and over 460,000 active development projects each and every month.
SYS-CON Events announced today that Nihon Micron will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Nihon Micron Co., Ltd. strives for technological innovation to establish high-density, high-precision processing technology for providing printed circuit board and metal mount RFID tags used for communication devices. For more inf...
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities – ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups. As a result, many firms employ new business models that place enormous impor...
SYS-CON Events announced today that MIRAI Inc. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MIRAI Inc. are IT consultants from the public sector whose mission is to solve social issues by technology and innovation and to create a meaningful future for people.
Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.
SYS-CON Events announced today that Dasher Technologies will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Dasher Technologies, Inc. ® is a premier IT solution provider that delivers expert technical resources along with trusted account executives to architect and deliver complete IT solutions and services to help our clients execute their goals, plans and objectives. Since 1999, we'v...
SYS-CON Events announced today that TidalScale, a leading provider of systems and services, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. TidalScale has been involved in shaping the computing landscape. They've designed, developed and deployed some of the most important and successful systems and services in the history of the computing industry - internet, Ethernet, operating s...
SYS-CON Events announced today that Massive Networks, that helps your business operate seamlessly with fast, reliable, and secure internet and network solutions, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. As a premier telecommunications provider, Massive Networks is headquartered out of Louisville, Colorado. With years of experience under their belt, their team of...
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.
Infoblox delivers Actionable Network Intelligence to enterprise, government, and service provider customers around the world. They are the industry leader in DNS, DHCP, and IP address management, the category known as DDI. We empower thousands of organizations to control and secure their networks from the core-enabling them to increase efficiency and visibility, improve customer service, and meet compliance requirements.
SYS-CON Events announced today that TidalScale will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. TidalScale is the leading provider of Software-Defined Servers that bring flexibility to modern data centers by right-sizing servers on the fly to fit any data set or workload. TidalScale’s award-winning inverse hypervisor technology combines multiple commodity servers (including their ass...
As hybrid cloud becomes the de-facto standard mode of operation for most enterprises, new challenges arise on how to efficiently and economically share data across environments. In his session at 21st Cloud Expo, Dr. Allon Cohen, VP of Product at Elastifile, will explore new techniques and best practices that help enterprise IT benefit from the advantages of hybrid cloud environments by enabling data availability for both legacy enterprise and cloud-native mission critical applications. By rev...
As popularity of the smart home is growing and continues to go mainstream, technological factors play a greater role. The IoT protocol houses the interoperability battery consumption, security, and configuration of a smart home device, and it can be difficult for companies to choose the right kind for their product. For both DIY and professionally installed smart homes, developers need to consider each of these elements for their product to be successful in the market and current smart homes.
Join IBM November 1 at 21st Cloud Expo at the Santa Clara Convention Center in Santa Clara, CA, and learn how IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Cognitive analysis impacts today’s systems with unparalleled ability that were previously available only to manned, back-end operations. Thanks to cloud processing, IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Imagine a robot vacuum that becomes your personal assistant tha...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, will lead you through the exciting evolution of the cloud. He'll look at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering ...
SYS-CON Events announced today that N3N will exhibit at SYS-CON's @ThingsExpo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. N3N’s solutions increase the effectiveness of operations and control centers, increase the value of IoT investments, and facilitate real-time operational decision making. N3N enables operations teams with a four dimensional digital “big board” that consolidates real-time live video feeds alongside IoT sensor data a...
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It’s clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. Tha...