|By Nikita Ivanov||
|December 29, 2014 12:00 PM EST||
A few months ago, I spoke at the conference where I explained the difference between caching and an in-memory data grid. Today, having realized that many people are also looking to better understand the difference between two major categories in in-memory computing: In-Memory Database and In-Memory Data Grid, I am sharing the succinct version of my thinking on this topic - thanks to a recent analyst call that helped to put everything in place
Skip to conclusion to get the bottom line.
Let's clarify the naming and buzzwords first. In-Memory Database (IMDB) is a well-established category name and it is typically used unambiguously.
It is important to note that there is a new crop of traditional databases with serious In-Memory "options". That includes MS SQL 2014, Oracle's Exalytics and Exadata, and IBM DB2 with BLU offerings. The line is blurry between these and the new pure In-Memory Databases, and for the simplicity I'll continue to call them In-Memory Databases.
In-Memory Data Grids (IMDGs) are sometimes (but not very frequently) called In-Memory NoSQL/NewSQL Databases. Although the latter can be more accurate in some case - I am going to use the In-Memory Data Grid term in this article, as it tends to be the more widely used term.
Note that there are also In-Memory Compute Grids and In-Memory Computing Platforms that include or augment many of the features of In-Memory Data Grids and In-Memory Databases.
Confusing, eh? It is... and for consistency - going forward we'll just use these terms for the two main categories:
- In-Memory Database
- In-Memory Data Grid
It is also important to nail down what we mean by "In-Memory". Surprisingly - there's a lot of confusion here as well as some vendors refer to SSDs, Flash-on-PCI, Memory Channel Storage, and, of course, DRAM as "In-Memory".
In reality, most vendors support a Tiered Storage Model where some portion of the data is stored in DRAM (the fastest storage but with limited capacity) and then it gets overflown to a verity of flash or disk devices (slower but with more capacity) - so it is rarely a DRAM-only or Flash-only product. However, it's important to note that most products in both categories are often biased towards mostly DRAM or mostly flash/disk storage in their architecture.
Bottom line is that products vary greatly in what they mean by "In-Memory" but in the end they all have a significant "In-Memory" component.
It's easy to start with technical differences between the two categories.
Most In-Memory Databases are your father's RDBMS that store data "in memory" instead of disk. That's practically all there's to it. They provide good SQL support with only a modest list of unsupported SQL features, shipped with ODBC/JDBC drivers and can be used in place of existing RDBMS often without significant changes.
In-Memory Data Grids typically lack full ANSI SQL support but instead provide MPP-based (Massively Parallel Processing) capabilities where data is spread across large cluster of commodity servers and processed in explicitly parallel fashion. The main access pattern is key/value access, MapReduce, various forms of HPC-like processing, and a limited distributed SQL querying and indexing capabilities.
It is important to note that there is a significant crossover from In-Memory Data Grids to In-Memory Databases in terms of SQL support. GridGain, for example, provides pretty serious and constantly growing support for SQL including pluggable indexing, distributed joins optimization, custom SQL functions, etc.
Speed Only vs. Speed + Scalability
One of the crucial differences between In-Memory Data Grids and In-Memory Databases lies in the ability to scale to hundreds and thousands of servers. That is the In-Memory Data Grid's inherent capability for such scale due to their MPP architecture, and the In-Memory Database's explicit inability to scale due to fact that SQL joins, in general, cannot be efficiently performed in a distribution context.
It's one of the dirty secrets of In-Memory Databases: one of their most useful features, SQL joins, is also is their Achilles heel when it comes to scalability. This is the fundamental reason why most existing SQL databases (disk or memory based) are based on vertically scalable SMP (Symmetrical Processing) architecture unlike In-Memory Data Grids that utilize the much more horizontally scalable MPP approach.
It's important to note that both In-Memory Data Grids and In-Memory Database can achieve similar speed in a local non-distributed context. In the end - they both do all processing in memory.
But only In-Memory Data Grids can natively scale to hundreds and thousands of nodes providing unprecedented scalability and unrivaled throughput.
Replace Database vs. Change Application
Apart from scalability, there is another difference that is important for uses cases where In-Memory Data Grids or In-Memory Database are tasked with speeding up existing systems or applications.
An In-Memory Data Grid always works with an existing database providing a layer of massively distributed in-memory storage and processing between the database and the application. Applications then rely on this layer for super-fast data access and processing. Most In-Memory Data Grids can seamlessly read-through and write-through from and to databases, when necessary, and generally are highly integrated with existing databases.
In exchange - developers need to make some changes to the application to take advantage of these new capabilities. The application no longer "talks" SQL only, but needs to learn how to use MPP, MapReduce or other techniques of data processing.
In-Memory Databases provide almost a mirror opposite picture: they often requirereplacing your existing database (unless you use one of those In-Memory "options" to temporary boost your database performance) - but will demand significantly less changes to the application itself as it will continue to rely on SQL (albeit a modified dialect of it).
In the end, both approaches have their advantages and disadvantages, and they may often depend in part on organizational policies and politics as much as on their technical merits.
The bottom line should be pretty clear by now.
If you are developing a green-field, brand new system or application the choice is pretty clear in favor of In-Memory Data Grids. You get the best of the two worlds: you get to work with the existing databases in your organization where necessary, and enjoy tremendous performance and scalability benefits of In-Memory Data Grids - both of which are highly integrated.
If you are, however, modernizing your existing enterprise system or application the choice comes down to this:
You will want to use an In-Memory Database if the following applies to you:
- You can replace or upgrade your existing disk-based RDBMS
- You cannot make changes to your applications
- You care about speed, but don't care as much about scalability
In other words - you boost your application's speed by replacing or upgrading RDBMS without significantly touching the application itself.
On the other hand, you want to use an In-Memory Data Grid if the following applies to you:
- You cannot replace your existing disk-based RDBMS
- You can make changes to (the data access subsystem of) your application
- You care about speed and especially about scalability, and don't want to trade one for the other
In other words - with an In-Memory Data Grid you can boost your application's speed and provide massive scale by tweaking the application, but without making changes to your existing database.
It can be summarized it in the following table:
|In-Memory Data Grid||In-Memory Database|
|Existing RDBMS||Unchanged||Changed or Replaced|
There will be 150 billion connected devices by 2020. New digital businesses have already disrupted value chains across every industry. APIs are at the center of the digital business. You need to understand what assets you have that can be exposed digitally, what their digital value chain is, and how to create an effective business model around that value chain to compete in this economy. No enterprise can be complacent and not engage in the digital economy. Learn how to be the disruptor and not the disruptee.
May. 30, 2015 10:15 AM EDT Reads: 1,195
2015 predictions circa 1970: houses anticipate our needs and adapt, city infrastructure is citizen and situation aware, office buildings identify and preprocess you. Today smart buildings have no such collective conscience, no shared set of fundamental services to identify, predict and synchronize around us. LiveSpace and M2Mi are changing that. LiveSpace Smart Environment devices deliver over the M2Mi IoT Platform real time presence, awareness and intent analytics as a service to local connected devices. In her session at @ThingsExpo, Sarah Cooper, VP Business of Development at M2Mi, will d...
May. 30, 2015 10:00 AM EDT Reads: 1,052
Thanks to widespread Internet adoption and more than 10 billion connected devices around the world, companies became more excited than ever about the Internet of Things in 2014. Add in the hype around Google Glass and the Nest Thermostat, and nearly every business, including those from traditionally low-tech industries, wanted in. But despite the buzz, some very real business questions emerged – mainly, not if a device can be connected, or even when, but why? Why does connecting to the cloud create greater value for the user? Why do connected features improve the overall experience? And why do...
May. 30, 2015 09:45 AM EDT Reads: 1,234
Almost everyone sees the potential of Internet of Things but how can businesses truly unlock that potential. The key will be in the ability to discover business insight in the midst of an ocean of Big Data generated from billions of embedded devices via Systems of Discover. Businesses will also need to ensure that they can sustain that insight by leveraging the cloud for global reach, scale and elasticity.
May. 30, 2015 09:30 AM EDT Reads: 7,335
Imagine a world where targeting, attribution, and analytics are just as intrinsic to the physical world as they currently are to display advertising. Advances in technologies and changes in consumer behavior have opened the door to a whole new category of personalized marketing experience based on direct interactions with products. The products themselves now have a voice. What will they say? Who will control it? And what does it take for brands to win in this new world? In his session at @ThingsExpo, Zack Bennett, Vice President of Customer Success at EVRYTHNG, will answer these questions a...
May. 30, 2015 09:15 AM EDT Reads: 1,015
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo in Silicon Valley. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place Nov 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 17th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal an...
May. 30, 2015 09:00 AM EDT Reads: 3,330
SYS-CON Events announced today that BMC will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. BMC delivers software solutions that help IT transform digital enterprises for the ultimate competitive business advantage. BMC has worked with thousands of leading companies to create and deliver powerful IT management services. From mainframe to cloud to mobile, BMC pairs high-speed digital innovation with robust IT industrialization – allowing customers to provide amazing user experiences with optimized IT per...
May. 30, 2015 08:15 AM EDT Reads: 1,868
We’re entering a new era of computing technology that many are calling the Internet of Things (IoT). Machine to machine, machine to infrastructure, machine to environment, the Internet of Everything, the Internet of Intelligent Things, intelligent systems – call it what you want, but it’s happening, and its potential is huge. IoT is comprised of smart machines interacting and communicating with other machines, objects, environments and infrastructures. As a result, huge volumes of data are being generated, and that data is being processed into useful actions that can “command and control” thi...
May. 30, 2015 08:00 AM EDT Reads: 1,502
Building low-cost wearable devices can enhance the quality of our lives. In his session at Internet of @ThingsExpo, Sai Yamanoor, Embedded Software Engineer at Altschool, provided an example of putting together a small keychain within a $50 budget that educates the user about the air quality in their surroundings. He also provided examples such as building a wearable device that provides transit or recreational information. He then reviewed the resources available to build wearable devices at home including open source hardware, the raw materials required and the options available to power s...
May. 30, 2015 05:30 AM EDT Reads: 4,578
In their session at @ThingsExpo, Shyam Varan Nath, Principal Architect at GE, and Ibrahim Gokcen, who leads GE's advanced IoT analytics, focused on the Internet of Things / Industrial Internet and how to make it operational for business end-users. Learn about the challenges posed by machine and sensor data and how to marry it with enterprise data. They also discussed the tips and tricks to provide the Industrial Internet as an end-user consumable service using Big Data Analytics and Industrial Cloud.
May. 30, 2015 04:30 AM EDT Reads: 5,887
We certainly live in interesting technological times. And no more interesting than the current competing IoT standards for connectivity. Various standards bodies, approaches, and ecosystems are vying for mindshare and positioning for a competitive edge. It is clear that when the dust settles, we will have new protocols, evolved protocols, that will change the way we interact with devices and infrastructure. We will also have evolved web protocols, like HTTP/2, that will be changing the very core of our infrastructures. At the same time, we have old approaches made new again like micro-services...
May. 30, 2015 03:30 AM EDT Reads: 5,843
How do APIs and IoT relate? The answer is not as simple as merely adding an API on top of a dumb device, but rather about understanding the architectural patterns for implementing an IoT fabric. There are typically two or three trends: Exposing the device to a management framework Exposing that management framework to a business centric logic Exposing that business layer and data to end users. This last trend is the IoT stack, which involves a new shift in the separation of what stuff happens, where data lives and where the interface lies. For instance, it's a mix of architectural styles ...
May. 30, 2015 03:00 AM EDT Reads: 6,214
Connected devices and the Internet of Things are getting significant momentum in 2014. In his session at Internet of @ThingsExpo, Jim Hunter, Chief Scientist & Technology Evangelist at Greenwave Systems, examined three key elements that together will drive mass adoption of the IoT before the end of 2015. The first element is the recent advent of robust open source protocols (like AllJoyn and WebRTC) that facilitate M2M communication. The second is broad availability of flexible, cost-effective storage designed to handle the massive surge in back-end data in a world where timely analytics is e...
May. 30, 2015 02:00 AM EDT Reads: 6,583
Collecting data in the field and configuring multitudes of unique devices is a time-consuming, labor-intensive process that can stretch IT resources. Horan & Bird [H&B], Australia’s fifth-largest Solar Panel Installer, wanted to automate sensor data collection and monitoring from its solar panels and integrate the data with its business and marketing systems. After data was collected and structured, two major areas needed to be addressed: improving developer workflows and extending access to a business application to multiple users (multi-tenancy). Docker, a container technology, was used to ...
May. 30, 2015 01:00 AM EDT Reads: 2,850
The true value of the Internet of Things (IoT) lies not just in the data, but through the services that protect the data, perform the analysis and present findings in a usable way. With many IoT elements rooted in traditional IT components, Big Data and IoT isn’t just a play for enterprise. In fact, the IoT presents SMBs with the prospect of launching entirely new activities and exploring innovative areas. CompTIA research identifies several areas where IoT is expected to have the greatest impact.
May. 29, 2015 09:00 PM EDT Reads: 5,576
The Industrial Internet revolution is now underway, enabled by connected machines and billions of devices that communicate and collaborate. The massive amounts of Big Data requiring real-time analysis is flooding legacy IT systems and giving way to cloud environments that can handle the unpredictable workloads. Yet many barriers remain until we can fully realize the opportunities and benefits from the convergence of machines and devices with Big Data and the cloud, including interoperability, data security and privacy.
May. 29, 2015 03:45 PM EDT Reads: 5,154
Explosive growth in connected devices. Enormous amounts of data for collection and analysis. Critical use of data for split-second decision making and actionable information. All three are factors in making the Internet of Things a reality. Yet, any one factor would have an IT organization pondering its infrastructure strategy. How should your organization enhance its IT framework to enable an Internet of Things implementation? In this session, James Kirkland, Red Hat's Chief Architect for the Internet of Things and Intelligent Systems, will describe how to revolutionize your architecture and...
May. 29, 2015 02:33 PM EDT Reads: 969
The Internet of Things is tied together with a thin strand that is known as time. Coincidentally, at the core of nearly all data analytics is a timestamp. When working with time series data there are a few core principles that everyone should consider, especially across datasets where time is the common boundary. In his session at Internet of @ThingsExpo, Jim Scott, Director of Enterprise Strategy & Architecture at MapR Technologies, discussed single-value, geo-spatial, and log time series data. By focusing on enterprise applications and the data center, he will use OpenTSDB as an example t...
May. 29, 2015 02:00 PM EDT Reads: 6,926
All major researchers estimate there will be tens of billions devices - computers, smartphones, tablets, and sensors - connected to the Internet by 2020. This number will continue to grow at a rapid pace for the next several decades. With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo, June 9-11, 2015, at the Javits Center in New York City. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be
May. 29, 2015 01:15 PM EDT Reads: 3,142
Scott Jenson leads a project called The Physical Web within the Chrome team at Google. Project members are working to take the scalability and openness of the web and use it to talk to the exponentially exploding range of smart devices. Nearly every company today working on the IoT comes up with the same basic solution: use my server and you'll be fine. But if we really believe there will be trillions of these devices, that just can't scale. We need a system that is open a scalable and by using the URL as a basic building block, we open this up and get the same resilience that the web enjoys.
May. 29, 2015 01:00 PM EDT Reads: 7,574