Java IoT Authors: Pat Romanski, Liz McMillan, Elizabeth White, Yeshim Deniz, Zakia Bouachraoui

Related Topics: @DXWorldExpo, Java IoT, Apache

@DXWorldExpo: Blog Post

How to Get More ROI from Big Data By @NitinBandugula | @BigDataExpo [#BigData]

Expand your use case options with Apache Drill

How to Get More ROI from Big Data and Hadoop with Apache Drill

Many businesses have run a Hadoop Big Data system dedicated to a specific use case. Perhaps they are collecting call center records, analyzing sensor reports from the factory floor, or monitoring tweets to track customer experience in real-time.

Confining Big Data-driven projects to select initiatives initially made sense, as many of the initial Big Data analysis solutions were optimized for a limited set of use cases. But solution options have matured and expanded, as have the data sources businesses draw from. To get the best out of your Big Data investment now - and take full advantage of the famous "three Vs of Big Data": volume, variety, velocity - you'll want to begin planning your shift from the single use case stage to the multiple use scenario.

Expand Your Use Case Options with Apache Drill
Utilizing data-driven intelligence across the enterprise requires solutions that enable interactive, self-service ways of working with historical and near real-time data. Core Hadoop platform has already solved many of the fundamental (legacy) Big Data access and availability problems. With the addition of standalone query engine Apache Drill, data analysts finally have the freedom to follow their data queries easily across multiple data sources, on demand.

Apache Drill was designed to support a wide range of SQL use cases on Big Data. Drill is particularly well-suited for use in situations that require low latency performance, including interactive query environments (OLAP, self‐service BI, data visualization) and investigative analytics (data science/exploration), and Day Zero analytics on near real-time data. It enables efficient analytics operations across a range of data sources and formats including JSON, Parquet and HBase tables.

Drill's efficiency across multiple use cases comes in great part from its architecture. Drill is built on hierarchically‐organized modules called drillbits, which are responsible for executing SQL statements. A drillbit is installed on each node that holds data, and is capable of executing SQL queries on the data that it manages. When data is stored across many nodes, all applicable drillbits process the query, parallelizing its execution. Applications accessing Drill are "connected" to different drillbits, avoiding availability bottlenecks and ensuring data locality.

Self-Service Data Exploration On-Demand
Drill is the only SQL engine for Hadoop that doesn't demand schemas to be created and maintained, or data to be transformed, before it can be queried. Data analysts can query data in its native formats, including nested data, self-describing data, and data with dynamic schemas. There is no need to explicitly define and maintain schemas; Drill can automatically leverage the structure embedded in the data. Self-service data exploration is finally a reality. Data can be worked with immediately upon its arrival, with no need to prepare a schema. Analysts can change and expand their data sources on the fly without waiting for IT services to structure newly requested data.

Analysts can also leverage their existing SQL skills and BI tools to directly query self-describing data and process complex data types. Of course, Hadoop hasn't lacked for SQL or SQL-comparable solutions - but many were designed with from a historical perspective - reengineering old school tools for Big Data usage. These projects filled a real need, but solutions must now be built to support the myriad of data-producing sources we now utilize, as well as the ways that we transform Big Data into actionable intelligence.

Drill has been tested by the open source community - and it was designed to be extensible. New data sources, new file formats, new operators, and new query languages can be easily added via new user‐defined functions or custom-created storage plugins for traditional data sources.

Drill: The Future of Big Data Exploration
Apache Drill was initially inspired by Google's Dremel project, and the open source community has worked hard to develop Drill is the ideal interactive SQL engine for Hadoop.  The success of these efforts was recently acknowledged officially by the Apache Software Foundation, which announced in December 2014 that it has promoted Drill to a top-level project at Apache.

As a top-level project, Drill joins other illustrious projects such as Apache Hadoop and httpd (the world's most popular Web server). Drill now has its own board of directors, and users can be confident that the project has proven itself, has a viable roadmap for development, and can be confidently deployed for mission-critical use in the long term.

If you're ready to test-drive Drill, you can do so using the MapR Sandbox for Hadoop, which runs on PC, Mac, and Linux platforms. MapR Technologies is the provider of the top-ranked distribution for Apache Hadoop.

You can also view a tutorial on analyzing real-world data using Apache Drill.

More Stories By Nitin Bandugula

As a Sr. Product Marketing Manager at MapR, Nitin brings his engineering, business and management skills together to market technology products. At MapR, Nitin focuses on SQL, batch and in-memory frameworks and streaming technologies on Hadoop. Prior to MapR, Nitin worked for enterprise companies and startups in various roles including Engineering, Product Management and Management Consulting. Nitin holds a Masters degree in Computer Science from the Illinois Institute of Technology and an MBA from the Johnson School at Cornell University.

IoT & Smart Cities Stories
Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As automation and artificial intelligence (AI) power solution development and delivery, many businesses need to build backend cloud capabilities. Well-poised organizations, marketing smart devices with AI and BlockChain capabilities prepare to refine compliance and regulatory capabilities in 2018. Volumes of health, financial, technical and privacy data, along with tightening compliance requirements by...
To Really Work for Enterprises, MultiCloud Adoption Requires Far Better and Inclusive Cloud Monitoring and Cost Management … But How? Overwhelmingly, even as enterprises have adopted cloud computing and are expanding to multi-cloud computing, IT leaders remain concerned about how to monitor, manage and control costs across hybrid and multi-cloud deployments. It’s clear that traditional IT monitoring and management approaches, designed after all for on-premises data centers, are falling short in ...
Poor data quality and analytics drive down business value. In fact, Gartner estimated that the average financial impact of poor data quality on organizations is $9.7 million per year. But bad data is much more than a cost center. By eroding trust in information, analytics and the business decisions based on these, it is a serious impediment to digital transformation.
In an era of historic innovation fueled by unprecedented access to data and technology, the low cost and risk of entering new markets has leveled the playing field for business. Today, any ambitious innovator can easily introduce a new application or product that can reinvent business models and transform the client experience. In their Day 2 Keynote at 19th Cloud Expo, Mercer Rowe, IBM Vice President of Strategic Alliances, and Raejeanne Skillern, Intel Vice President of Data Center Group and G...
Discussions of cloud computing have evolved in recent years from a focus on specific types of cloud, to a world of hybrid cloud, and to a world dominated by the APIs that make today's multi-cloud environments and hybrid clouds possible. In this Power Panel at 17th Cloud Expo, moderated by Conference Chair Roger Strukhoff, panelists addressed the importance of customers being able to use the specific technologies they need, through environments and ecosystems that expose their APIs to make true ...
The current age of digital transformation means that IT organizations must adapt their toolset to cover all digital experiences, beyond just the end users’. Today’s businesses can no longer focus solely on the digital interactions they manage with employees or customers; they must now contend with non-traditional factors. Whether it's the power of brand to make or break a company, the need to monitor across all locations 24/7, or the ability to proactively resolve issues, companies must adapt to...
We are seeing a major migration of enterprises applications to the cloud. As cloud and business use of real time applications accelerate, legacy networks are no longer able to architecturally support cloud adoption and deliver the performance and security required by highly distributed enterprises. These outdated solutions have become more costly and complicated to implement, install, manage, and maintain.SD-WAN offers unlimited capabilities for accessing the benefits of the cloud and Internet. ...
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
DXWorldEXPO LLC announced today that "IoT Now" was named media sponsor of CloudEXPO | DXWorldEXPO 2018 New York, which will take place on November 11-13, 2018 in New York City, NY. IoT Now explores the evolving opportunities and challenges facing CSPs, and it passes on some lessons learned from those who have taken the first steps in next-gen IoT services.
Founded in 2000, Chetu Inc. is a global provider of customized software development solutions and IT staff augmentation services for software technology providers. By providing clients with unparalleled niche technology expertise and industry experience, Chetu has become the premiere long-term, back-end software development partner for start-ups, SMBs, and Fortune 500 companies. Chetu is headquartered in Plantation, Florida, with thirteen offices throughout the U.S. and abroad.