Click here to close now.

Welcome!

Java Authors: Liz McMillan, Carmen Gonzalez, Pat Romanski, Elizabeth White, Jnan Dash

Related Topics: Virtualization, Java, Microservices Journal, Cloud Expo, Big Data Journal, SDN Journal

Virtualization: Article

The Big Data Bottleneck: Uploading to the Cloud

If only we could get those gigando-bytes into the Cloud in the first place. And there’s the rub.

The problem with Big Data is that, well, Big Data are big. Really big. We’re talking terabytes. Petabytes. Zettabytes. Whatever’s-even-bigger-bytes. And of course, we want to solve all our Big Data challenges in the Cloud. If only we could get those gigando-bytes into the Cloud in the first place. And there’s the rub.

Uploading Big Data from our internal network to the Cloud via an Internet connection is as practical as filling a swimming pool through a drinking straw. It doesn’t matter how sophisticated our Big Data analytics, how super-duper our Hadoopers. If we can’t efficiently get our data where we need them when we need them, we’re stuck.

Optimize the Pipe
Fortunately, the Big Data upload problem isn’t new. In fact, it’s been around for years, under the moniker Wide Area Network (WAN) Optimization. Fortunate for us because vendors have been working on WAN Optimization techniques for a while now, and now several of them are repurposing those techniques to help with the Cloud.

For example, Aryaka has been peddling WAN Optimization appliances for several years. Put one appliance in your local data center, a second in the remote data center, and proprietary technology moves data from one to the other at a rapid clip. Now that the Cloud has turned their world upside down, they are providing a distributed service at the remote end, a “mesh of network connections” better suited to the Cloud. In other words, Aryaka is building an offering similar to Content Delivery Networks (CDNs) like Akamai.

RainStor, in contrast, focuses primarily on a proprietary compression algorithm that promises to squeeze data into one fortieth their original size. Furthermore, RainStor’s compressed data remain directly accessible using standard SQL or even MapReduce on Hadoop with no storage-eating, time-consuming reinflation.

Then there’s Aspera, who’s found a sophisticated way around the limitations of the Transmission Control Protocol (TCP) itself. After all, TCP’s tiny packets and penchant for resending them are a large part of the reason uploading Big Data over the Internet runs like such a dog in the first place. To teach this dog a new trick or two, Aspera transfers use one TCP port for session initialization and control, and one User Datagram Protocol (UDP) port for data transfer.

UDP is an older, fire-and-forget protocol that doesn’t perform the retries that provide TCP’s reliability, but by combining the two protocols, FASP achieves nearly 100% error-free data throughput. In fact, FASP reaches the maximum transfer speed possible given the hardware on which you deploy it, and maintains maximum available throughput independent of network delay and packet loss. FASP also aggregates hundreds of concurrent transfers on commodity hardware, addressing the drinking straw problem in part by supporting hundreds of straws at once.

CloudOpt is also a player worth mentioning. Their JetStream technology takes a soup-to-nuts approach that combines compression and transmission protocol optimization with advanced data deduplication, SSL acceleration, and an ingenious approach to getting the most performance out of cached data. Or Attunity Cloudbeam, that touts file to Cloud upload, file to Cloud replication, and Cloud to Cloud replication. Attunity’s Managed File Transfer (MFT) incorporates a secure DMZ architecture, security policy enforcement, guaranteed and accelerated transfers, process automation, and audit capabilities across each stage of the file transfer process.

Finally, there’s Amazon Web Services (AWS) itself. Yes, most if not all of the vendors discussed above can firehose data into AWS’s various storage services. But AWS also offers a simple, if decidedly low-tech approach as well: AWS Import/Export. Simply ship your big hard drives to Amazon. They’ll hook them up, copy the data to your Simple Storage Service (S3) or other storage service, and ship the drive back when you’re done. This SneakerNet or “Forklifting” approach, believe it or not, can even be faster than some of the over-the-Internet optimizations for certain Big Data sets, even considering the time it takes to FedEx AWS your drives.

On Beyond Drinking Straws
The problem with most of the approaches above (excepting only Aspera and Amazon’s forklift) is that they make the drinking straw we’re using to fill that swimming pool better, faster, and bigger – but we’re still filling that damn pool with a straw. So what’s better than a straw? How about many straws? If any optimization technique improves a single connection to the Internet, then it stands to reason that establishing many connections to your Cloud provider in parallel would multiply your upload speed dramatically.

Fair enough, but let’s think out of the box here. A fundamental Big Data best practice is to bring your analytics to your data. The reasoning is that it’s hard to move your data but easy to move your software, so once your data are in the Cloud, you should also run your analytics there.

But this argument also works in reverse. If your data aren’t in the Cloud, then it may not make sense to move them to the Cloud simply to run your software there. Instead, bring your software to your data, even if they’re on premise.

Perish the thought, you say! We’re sold on Big Data in the Cloud. We’ve crunched the numbers and we know it’s going to save us money, provide more capabilities, and facilitate sharing information across our organization and the world. Fair enough. Here’s another twist for you.

Why are your Big Data sets outside the Cloud to begin with? Sure, you’re stuck with existing, legacy data sets wherever they happen to be today. But as a rule, those don’t constitute Big Data, or will cease to qualify as being large enough to warrant the Big Data label relatively soon. By definition, Big Data sets keep expanding exponentially, which means that you keep creating them with generations of newfangled tools.

In fact, there are already multitudinous sources for raw Big Data, as varied as the Big Data challenges organizations struggle with today. But many such sources are already in the Cloud, or could be moved to the Cloud simply. For example, clickthrough data from your Web sites. Such data come from your Web servers, which should be in the Cloud anyway. If your Big Data come from Web Servers scattered here and there in the Cloud, then moving the clickthrough data to a Big Data repository for processing can be handled in the same Cloud. No need for uploading.

What about data sources that aren’t already in the Cloud? Many Big Data streams come from instrumentation or sensors of some sort, from seismographs underground to EKGs in hospitals to UPC scanners in supermarkets. There’s no reason why such instrumentation shouldn’t pour their raw data feeds directly to the Cloud. What good is storing a week’s worth of supermarket purchasing data on premise anyway? You’ll want to store, process, manage, and analyze those data in the Cloud, so the sooner you get it there, the better.

The ZapThink Take
The only reason we have to worry about uploading Big Data to the Cloud in the first place is because our Big Data aren’t already in the Cloud. And broadly speaking, the reason they’re not already in the Cloud is because the Cloud isn’t everywhere. Instead, we think of the Cloud as being locked away in data centers, those alien, air conditioned facilities packed full of racks of high tech equipment.

That may be true today, but as ZapThink has discussed before, there’s nothing in the definition of Cloud Computing that requires Cloud resources to live in data centers. You might have a bit of the Cloud in your pocket, or on your laptop, in your car, or in your refrigerator. For now, this vision of the Internet of Things meeting the Cloud is mostly the stuff of science fiction. We’re only now figuring out what it means to have a ubiquitous global network of sensors, from the aforementioned EKGs and UPC scanners to traffic cameras to home thermostats. But the writing is on the wall. Just as we now don’t think twice about carrying supercomputers in our pockets, it’s only a matter of time until the Cloud itself is fully distributed and ubiquitous. When that happens, the question of moving Big Data to the Cloud will be moot. They will already be there.

Are you one of the vendors mentioned in this article and have a correction, or a vendor who should have been mentioned but wasn’t? Please feel free to comment here.

Image Source: US Navy

More Stories By Jason Bloomberg

Jason Bloomberg is the leading expert on architecting agility for the enterprise. As president of Intellyx, Mr. Bloomberg brings his years of thought leadership in the areas of Cloud Computing, Enterprise Architecture, and Service-Oriented Architecture to a global clientele of business executives, architects, software vendors, and Cloud service providers looking to achieve technology-enabled business agility across their organizations and for their customers. His latest book, The Agile Architecture Revolution (John Wiley & Sons, 2013), sets the stage for Mr. Bloomberg’s groundbreaking Agile Architecture vision.

Mr. Bloomberg is perhaps best known for his twelve years at ZapThink, where he created and delivered the Licensed ZapThink Architect (LZA) SOA course and associated credential, certifying over 1,700 professionals worldwide. He is one of the original Managing Partners of ZapThink LLC, the leading SOA advisory and analysis firm, which was acquired by Dovel Technologies in 2011. He now runs the successor to the LZA program, the Bloomberg Agile Architecture Course, around the world.

Mr. Bloomberg is a frequent conference speaker and prolific writer. He has published over 500 articles, spoken at over 300 conferences, Webinars, and other events, and has been quoted in the press over 1,400 times as the leading expert on agile approaches to architecture in the enterprise.

Mr. Bloomberg’s previous book, Service Orient or Be Doomed! How Service Orientation Will Change Your Business (John Wiley & Sons, 2006, coauthored with Ron Schmelzer), is recognized as the leading business book on Service Orientation. He also co-authored the books XML and Web Services Unleashed (SAMS Publishing, 2002), and Web Page Scripting Techniques (Hayden Books, 1996).

Prior to ZapThink, Mr. Bloomberg built a diverse background in eBusiness technology management and industry analysis, including serving as a senior analyst in IDC’s eBusiness Advisory group, as well as holding eBusiness management positions at USWeb/CKS (later marchFIRST) and WaveBend Solutions (now Hitachi Consulting).

@ThingsExpo Stories
SYS-CON Events announced today that IDenticard will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. IDenticard™ is the security division of Brady Corp (NYSE: BRC), a $1.5 billion manufacturer of identification products. We have small-company values with the strength and stability of a major corporation. IDenticard offers local sales, support and service to our customers across the United States and Canada. Our partner network encompasses some 300 of the world's leading systems integrators and security s...
SYS-CON Events announced today that SoftLayer, an IBM company, has been named “Gold Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place June 9-11, 2015 at the Javits Center in New York City, NY, and the 17th International Cloud Expo®, which will take place November 3–5, 2015 at the Santa Clara Convention Center in Santa Clara, CA. SoftLayer operates a global cloud infrastructure platform built for Internet scale. With a global footprint of data centers and network points of presence, SoftLayer provides infrastructure as a service to leading-edge customers ranging from ...
SYS-CON Events announced today that Cisco, the worldwide leader in IT that transforms how people connect, communicate and collaborate, has been named “Gold Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Cisco makes amazing things happen by connecting the unconnected. Cisco has shaped the future of the Internet by becoming the worldwide leader in transforming how people connect, communicate and collaborate. Cisco and our partners are building the platform for the Internet of Everything by connecting the...
SYS-CON Events announced today that Liaison Technologies, a leading provider of data management and integration cloud services and solutions, has been named "Silver Sponsor" of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York, NY. Liaison Technologies is a recognized market leader in providing cloud-enabled data integration and data management solutions to break down complex information barriers, enabling enterprises to make smarter decisions, faster.
SYS-CON Events announced today that Windstream, a leading provider of advanced network and cloud communications, has been named “Silver Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. Windstream (Nasdaq: WIN), a FORTUNE 500 and S&P 500 company, is a leading provider of advanced network communications, including cloud computing and managed services, to businesses nationwide. The company also offers broadband, phone and digital TV services to consumers primarily in rural areas.
SYS-CON Events announced today that Dyn, the worldwide leader in Internet Performance, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Dyn is a cloud-based Internet Performance company. Dyn helps companies monitor, control, and optimize online infrastructure for an exceptional end-user experience. Through a world-class network and unrivaled, objective intelligence into Internet conditions, Dyn ensures traffic gets delivered faster, safer, and more reliably than ever.
SYS-CON Events announced today that Open Data Centers (ODC), a carrier-neutral colocation provider, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. Open Data Centers is a carrier-neutral data center operator in New Jersey and New York City offering alternative connectivity options for carriers, service providers and enterprise customers.
SYS-CON Events announced today that On the Avenue Marketing Group, a sales and marketing firm that utilizes events to market and sell products to consumers, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. On the Avenue Marketing Group (OTA) is a sales and marketing firm that utilizes events to market and sell products to consumers. On behalf of our clients, we attend thousands of fairs, festivals, expos, concerts, conferences, and sporting events annually, helping them reach millions of individuals ...
SYS-CON Events announced today that ActiveState, the leading independent Cloud Foundry and Docker-based PaaS provider, has been named “Silver Sponsor” of SYS-CON's DevOps Summit New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. ActiveState believes that enterprises gain a competitive advantage when they are able to quickly create, deploy and efficiently manage software solutions that immediately create business value, but they face many challenges that prevent them from doing so. The Company is uniquely positioned to help address these challenges thro...
SYS-CON Events announced today that Vitria Technology, Inc. will exhibit at SYS-CON’s @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Vitria will showcase the company’s new IoT Analytics Platform through live demonstrations at booth #330. Vitria’s IoT Analytics Platform, fully integrated and powered by an operational intelligence engine, enables customers to rapidly build and operationalize advanced analytics to deliver timely business outcomes for use cases across the industrial, enterprise, and consumer segments.
SYS-CON Events announced today that Alert Logic, the leading provider of Security-as-a-Service solutions for the cloud, has been named “Bronze Sponsor” of SYS-CON's 16th International Cloud Expo® and DevOps Summit 2015 New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY, and the 17th International Cloud Expo® and DevOps Summit 2015 Silicon Valley, which will take place November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that kintone has been named “Bronze Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY, and the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. kintone promotes cloud-based workgroup productivity, transparency and profitability with a seamless collaboration space, build your own business application (BYOA) platform, and workflow automation system.
SYS-CON Events announced today that Akana, formerly SOA Software, has been named “Bronze Sponsor” of SYS-CON's 16th International Cloud Expo® New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. Akana’s comprehensive suite of API Management, API Security, Integrated SOA Governance, and Cloud Integration solutions helps businesses accelerate digital transformation by securely extending their reach across multiple channels – mobile, cloud and Internet of Things. Akana enables enterprises to share data as APIs, connect and integrate applications, drive part...
SYS-CON Events announced today that CommVault has been named “Bronze Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY, and the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. A singular vision – a belief in a better way to address current and future data management needs – guides CommVault in the development of Singular Information Management® solutions for high-performance data protection, universal availability and sim...
SYS-CON Events announced today that SafeLogic has been named “Bag Sponsor” of SYS-CON's 16th International Cloud Expo® New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. SafeLogic provides security products for applications in mobile and server/appliance environments. SafeLogic’s flagship product CryptoComply is a FIPS 140-2 validated cryptographic engine designed to secure data on servers, workstations, appliances, mobile devices, and in the Cloud.
SYS-CON Events announced today that StorPool Storage will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. StorPool is distributed storage software that allows service providers, enterprises and other cloud builders to run data storage on standard x86 servers, instead of using expensive and inefficient storage arrays (SAN).
SYS-CON Events announced today that Site24x7, the cloud infrastructure monitoring service, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Site24x7 is a cloud infrastructure monitoring service that helps monitor the uptime and performance of websites, online applications, servers, mobile websites and custom APIs. The monitoring is done from 50+ locations across the world and from various wireless carriers, thus providing a global perspective of the end-user experience. Site24x7 supports monitoring H...
SYS-CON Events announced today that Intelligent Systems Services will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Established in 1994, Intelligent Systems Services Inc. is located near Washington, DC, with representatives and partners nationwide. ISS’s well-established track record is based on the continuous pursuit of excellence in designing, implementing and supporting nationwide clients’ mission-critical systems. ISS has completed many successful projects in Healthcare, Commercial, Manufacturing, ...
SYS-CON Events announced today that B2Cloud, a provider of enterprise resource planning software, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. B2cloud develops the software you need. They have the ideal tools to help you work with your clients. B2Cloud’s main solutions include AGIS – ERP, CLOHC, AGIS – Invoice, and IZUM
SYS-CON Events announced today that Vicom Computer Services, Inc., a provider of technology and service solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. They are located at booth #427. Vicom Computer Services, Inc. is a progressive leader in the technology industry for over 30 years. Headquartered in the NY Metropolitan area. Vicom provides products and services based on today’s requirements around Unified Networks, Cloud Computing strategies, Virtualization around Software defined Data Ce...