| By Daniel Thompson | Article Rating: |
|
| February 15, 2013 10:00 AM EST | Reads: |
1,864 |
My working title was Big Data, Storage Dilemma.
They say dilemma. I say dilemna. I'm serious. I spell it dilemna.
Big Data presents something of a storage dilemma. There is no one data store to rule them all.
Should different data structures be persisted to different storage mediums?
Storage Medium
Identifying the appropriate medium is a function of performance, cost, and capacity.
Random Access Memory
It's fast. Very. It's expense. Very.
If we are configuring an HP DL980 G7 server, it will cost $3,672 for 128GB of memory with 8x 16GB modules or $9,996 for 128GB of memory with 4x 32GB modules. That is $28-78 / GB.
We can configure an HP DL980 G7 server with up to 4TB of memory.
Solid State Drive
It's not as fast or as expensive as random access memory. It's faster than a hard disk drive. A lot faster. It's more expensive than a hard disk drive. A lot more expensive.
If we are configuring an HP DL980 G7 server, it will cost $4,199 for a 400GB SAS MLC drive or $7,419 for a 400GB SAS SLC drive. That is $10-18 / GB. It's a performance / size trade-off. While SLC drives often perform better, MLC drives are often available in larger sizes. Further, the read / write performance is either symmetric or asymmetric.
That, and SLC drives often have greater endurance.
| Sequential Read (MB/s) |
Sequential Write (MB/s) |
Random Read (IOPS) |
Random Write (IOPS) |
|
| Intel 520 (480GB) | 550 | 520 | 50,000 | 50,000 |
| Intel 520 (240GB) | 550 | 520 | 50,000 | 80,000 |
| Intel DC S3700 | 500 | 460 | 75,000 | 36,000 |
The performance of solid state drives can vary:
- consumer / enterprise
- MLC / eMLC / SLC
- SATA / SAS / PCIe
The capacity of enterprise solid state drives is often less than that of enterprise hard disk drives (4TB). However, the capacity of PCIe drives such as the Fusion-io ioDrive Octal (5.12TB / 10.24TB) is greater than that of enterprise hard disk drives (4TB). In addition, the performance of PCIe drives is greater than that of SAS or SATA drives.
| Sequential Read (MB/s) |
Sequential Write (MB/s) |
Random Read (IOPS) |
Random Write (IOPS) |
|
| Intel 910 (800GB) | 2,000 | 1,000 | 180,000 | 75,000 |
Hard Disk Drive
It may not be fast, but it is inexpensive.
If we are configuring an HP DL980 G7 server, it will cost $309 for a 300GB 10K drive or $649 for a 300GB 15K drive. That is $1-2 / GB. It's a performance / size trade-off. While a 15K drive will perform better than a 10K drive, it will often have less capacity. The sequential read / write performance of hard disk drives is often between 150-200MB/s. As such, a RAID configuration may be a cost effective alternative to a single solid state drive for sequential read / write access.
The capacity of enterprise hard disk drives (4TB) is often greater than that of enterprise solid state drives.
Data Structure
Hash Table
JBoss Data Grid is an in-memory data grid with the data stored in a hash table. However, JBoss Data Grid supports persistence via write-behind or write-through. For all intents and purposes (e.g. map / reduce), all of the data should fit in memory.
Access
- Random Reads, Random Writes
Riak is a key / value store. If persistence has been configured with Bitcast, the data is stored in a log structured hash table. An in-memory hash table contains the key / value pointers. The value is a pointer to the data. As such, all of the key / value pointers must fit in memory. The data is persisted via append only log files.
Access
- Complex Index in Memory
- Random Reads, Sequential Writes
Point queries perform well in hash tables. Range queries do not. Though there is always map / reduce.
B-Tree
MongoDB is a document database with the data stored in a B-Tree. The data is persisted via memory mapped files.
Access
- Partial Index (Internal Nodes) in Memory
- Random Reads, Sequential Reads, Random Writes
CouchDB is a document database with the data stored in a B+Tree. The data is persisted via append only log files.
Access
- Partial Index in Memory
- Random Reads, Sequential Reads, Sequential Writes
In a B+ Tree, the data is only stored in the leaf nodes. In a B-Tree, the data is stored in both the internal nodes and the leaf nodes. The advantage of a B+ Tree is that the leaf nodes are linked. As a result, range queries perform better with a B+ Tree. However, point queries perform better with a B-Tree.
CouchDB has implemented an append only B+ Tree. An alternative is the copy-on-write (CoW) B+ Tree. A write optimization for a B-Tree is buffering. First, the data written to an internal buffer in an internal node. Second, the buffer is flushed to a leaf node. As a result, random writes are turned in to sequential writes. The cost of random writes is thus amortized. However, I am not aware of any open source data stores that have implemented a CoW B+ Tree or have implemented buffering with a B-Tree.
Range queries perform well with a B/B+ Tree. Point queries, not as well as hash tables.
Log Structured Merge Tree
Apache HBase and Apache Cassandra are both column oriented data stores, and they have both store data in an LSM-Tree. Apache HBase has implemented a cache oblivious lookahead array (COLA). Apache Cassandra has implemented a sorted array merge tree (SAMT).
In both implementations, a write is first written to a write-ahead log. Next, it is written to a memtable. A memtable is an in-memory sorted string table (SSTable). Later, the memtable is flushed and persisted as an SSTable. As result, random writes are turned in to sequential writes. However, a point query with an LSM-Tree may require multiple random reads.
Apache HBase implemented a single-level index with HFile version 1. However, because every index was cached, it resulted in high memory usage.
Apache HBase implemented a multi-level index a la a B+ Tree with HFile (SSTable) version 2. The SSTable contains a root index, a root index with leaf indexes, or a root index with an intermediate index and leaf indexes. The root index is stored in memory. The intermediate and leaf indexes may be stored in the block cache. In addition, the SSTable contains a compound (block level) bloom filter. The bloom filter is used to determine if the data is not in the data block.
Recommendation / Access
- Partial Index / Bloom Filter in Memory
- Random Reads, Sequential Reads, Sequential Writes
A read optimization for an LSM-Tree is fractional cascading. The idea is that each level contains both data and pointers to data in the next level. However, I am not aware of any open source data stores that have implemented fractional cascading.
I like the idea of a hybrid / tiered storage solution for an LSM-Tree implementation with the first level in memory, second level on solid state drives, and third level on hard disk drives. I've seen this solution described in academia, but I am not aware of any open source implementations.
Conclusion
I find it interesting that key / value stores often store data in a hash table, that document stores often store data in a B+/B-Tree, and that column oriented stores often store data in an LSM-Tree. Perhaps it is because key / value stores focus on the performance of point queries, document stores support secondary indexes (MongoDB) / views (CouchDB), and column oriented stores support range queries. Perhaps it because document stores were not created with distribution (shards / partitions) in mind and thus assume that the complete index can not be stored in memory.
Notes
I found this paper to be very helpful in examining data structures as it covers most of the ones highlighted in this post:
Efficient, Scalable, and Versatile Application and System Transaction Management for Direct Storage Layers (link)
Read the original blog entry...
Published February 15, 2013 Reads 1,864
Copyright © 2013 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Daniel Thompson
I curate the content on this page, but the credit goes to my talented colleagues for the posts that you see here. Much of what you read on this page is the work of friends at How to JBoss, and I encourage you to drop by the site at http://www.howtojboss.com for some of the best JBoss technical and non-technical content for developers, architects and technology executives on the Web.
- Cloud People: A Who's Who of Cloud Computing
- New Relic Q1 2013 Blazes Past Growth Targets and Reaches 40,000 Active Customer Accounts
- Cloud Expo New York: Delivering Digital Marketing on the Cloud
- Cloud Expo New York: Rethink IT and Reinvent Business with IBM SmartCloud
- Cloudant to Exhibit at Cloud Expo & Big Data Expo New York
- The Accessibility of the Cloud
- Learn How To Use Google Apps Script
- Cloud Expo NY: Best Practices for Delivering Oracle Database as a Service
- Cloud Expo New York: Basics of SSD Technology and Its Use in Cloud
- Session Topics: 12th Cloud Expo / Cloud Expo New York
- Cloud Expo New York: The Big Challenge of Big Data & Hadoop Integration
- Measuring the Business Value of Cloud Computing
- Cloud People: A Who's Who of Cloud Computing
- Cloud Expo New York: Best CIO Practices Shared from SHI’s Customers
- Cloud Expo New York: How to Use Google Apps Script
- New Relic Q1 2013 Blazes Past Growth Targets and Reaches 40,000 Active Customer Accounts
- Cloud Expo New York: Why Big Data Is Really About Small Data
- Cloud Expo New York: Delivering Digital Marketing on the Cloud
- Small Cancers, Big Data, and a Life Examined
- Cloud Expo New York: Requirements of a Cloud Database
- Cloud Expo New York: Rethink IT and Reinvent Business with IBM SmartCloud
- Cloudant to Exhibit at Cloud Expo & Big Data Expo New York
- The Accessibility of the Cloud
- Learn How To Use Google Apps Script
- A Cup of AJAX? Nay, Just Regular Java Please
- Java Developer's Journal Exclusive: 2006 "JDJ Editors' Choice" Awards
- JavaServer Faces (JSF) vs Struts
- The i-Technology Right Stuff
- Rich Internet Applications with Adobe Flex 2 and Java
- Java vs C++ "Shootout" Revisited
- Bean-Managed Persistence Using a Proxy List
- Reporting Made Easy with JasperReports and Hibernate
- Creating a Pet Store Application with JavaServer Faces, Spring, and Hibernate
- Why Do 'Cool Kids' Choose Ruby or PHP to Build Websites Instead of Java?
- What's New in Eclipse?
- Where Are RIA Technologies Headed in 2008?























