Click here to close now.

Welcome!

Java Authors: Liz McMillan, VictorOps Blog, AppDynamics Blog, Leon Fayer, Pat Romanski

Related Topics: Big Data Journal, Java, MICROSERVICES, Cloud Expo, Apache, Security

Big Data Journal: Blog Feed Post

How to Secure Hadoop Without Touching It

Combining API Security and Hadoop

It sounds like a parlor trick, but one of the benefits of API centric de-facto standards  such as REST and JSON is they allow relatively seamless communication between software systems.

This makes it possible to combine technologies to instantly bring out new capabilities. In particular I want to talk about how an API Gateway can improve the security posture of a Hadoop installation without having to actually modify Hadoop itself. Sounds too good to be true? Read on.

Hadoop and RESTful APIs
Hadoop is mostly a behind the firewall affair, and APIs are generally used for exposing data or capabilities for other systems, users or mobile devices. In the case of Hadoop there are three main RESTful APIs to talk about. This list isn’t exhaustive but it covers the main APIs.

  1. WebHDFS – Offers complete control over files and directories in HDFS
  2. HBase REST API – Offers access to insert, create, delete, single/multiple cell values
  3. HCatalog REST API – Provides job control for Map/Reduce, Pig and Hive as well as to access and manipulate HCatalog DDL data

These APIs are very useful because anyone with an HTTP client can potentially manipulate data in Hadoop. This, of course, is like using a knife all-blade – it’s very easy to cut yourself. To take an example, WebHDFS allows RESTful calls for directory listings, creating new directories and files, as well as file deletion. Worse,  the default security model requires nothing more than inserting “root” into the HTTP call.

To its credit, most distributions of Hadoop also offer Kerberos SPNEGO authentication, but additional work is needed to support other types of authentication and authorization schemes, and not all REST calls that expose sensitive data (such as a list of files) are secured. Here are some of the other challenges:

  • Fragmented Enforcement – Some REST calls leak information and require no credentials
  • Developer Centric Interfaces – Full Java stack traces are passed back to callers, leaking system details
  • Resource Protection – The Namenode is a single point of failure and excessive WebHDFS activity may threaten the cluster
  • Consistent Security Policy – All APIs in Hadoop must be independently configured, managed and audited over time

This list is just a start, and to be fair, Hadoop is still evolving. We expect things to get better over time, but for Enterprises to unlock value from their “Big Data” projects now, they can’t afford to wait until security is perfect.

One model used in other domains is an API Gateway or proxy that sits between the Hadoop cluster and the client. Using this model, the cluster only trusts calls from the gateway and all potential API callers are forced to use the gateway. Further, the gateway capabilities are rich enough and expressive enough to perform the full depth and breadth of security for REST calls from authentication to message level security, tokenization, throttling, denial of service protection, attack protection and data translation. Even better, this provides a safe and effective way to expose Hadoop to mobile devices without worrying about performance, scalability and security.  Here is the conceptual picture:

Intel Expressway API Manager and Intel Distribution of Apache Hadoop

In the previous diagram we are showing the Intel(R) Expressway API Manager acting as a proxy for WebHDFS, HBase and HCatalog APIs exposed from Intel’s Hadoop distribution. API Manager exposes RESTful APIs and also provides an out of the box subscription to Mashery to help evangelize APIs among a community of developers.

All of the policy enforcement is done at the HTTP layer by the gateway and the security administrator is free to rewrite the API to be more user friendly to the caller and the gateway will take care of mapping and rewriting the REST call to the format supported by Hadoop. In short, this model lets you provide instant Enterprise security for a good chunk of Hadoop capabilities without having to add a plug-in, additional code or a special distribution of Hadoop. So… just what can you do without touching Hadoop? To take WebHDFS as an example the following is possible with some configuration on the gateway itself:

  1. A gateway can lock-down the standard WebHDFS REST API and allow access only for specific users based on an Enterprise identity that may be stored in LDAP, Active Directory, Oracle, Siteminder, IBM or Relational Databases.
  2. A gateway provides additional authentication methods such as X.509 certificates with CRL and OCSP checking, OAuth token handling, API keys support, WS-Security and SSL termination & acceleration for WebHDFS API calls. The gateway can expose secure versions of the WebDHFS API for external access
  3. A gateway can improve on the security model used by WebHDFS which carries identities in HTTP query parameters, which are more susceptible to credential leakage compared to a security model based on HTTP headers. The gateway can expose a variant of the WebHDFS API that expects credentials in the HTTP header and seamlessly maps this to the WebHDFS internal format
  4. The gateway workflow engine can maps a single function REST call into multiple WebHDFS calls. For example, the WebHDFS REST API requires two separate HTTP calls for file creation and file upload. The gateway can expose a single API for this that handles the sequential execution and error handling, exposing a single function to the user
  5. The gateway can strip and redact Java exception traces carried in the WebHDFS REST API responses ( for instance, JSON responses may carry org.apache.hadoop.security.AccessControlException.* which can spill details beneficial to an attacker
  6. The gateway can throttle and rate shape WebHDFS REST requests which can protect the Hadoop cluster from resource consumption from excessive HDFS writes, open file handles and excessive  create, read, update and delete operations which might impact a running job.

This list is just the start, API manager can also perform selective encryption and data protection (such as PCI tokenization or PII format preserving encryption) on data as it is inserted or deleted from the Hadoop cluster, all by sitting in-between the caller and the cluster. So the parlor trick here is really moving the problem from trying to secure hadoop from the inside out to moving and centralizing security to the enforcement point. If you are looking for a way to expose “Big Data” outside the cluster, an the API Gateway model may be worth some investigation!

Blake

 

The post How to secure Hadoop without touching it – combining API Security and Hadoop appeared first on Security Gateways@Intel.

Read the original blog entry...

More Stories By Application Security

This blog references our expert posts on application and web services security.

@ThingsExpo Stories
GENBAND has announced that SageNet is leveraging the Nuvia platform to deliver Unified Communications as a Service (UCaaS) to its large base of retail and enterprise customers. Nuvia’s cloud-based solution provides SageNet’s customers with a full suite of business communications and collaboration tools. Two large national SageNet retail customers have recently signed up to deploy the Nuvia platform and the company will continue to sell the service to new and existing customers. Nuvia’s capabilities include HD voice, video, multimedia messaging, mobility, conferencing, Web collaboration, deskt...
The Open Compute Project is a collective effort by Facebook and a number of players in the datacenter industry to bring lessons learned from the social media giant's giant IT deployment to the rest of the world. Datacenters account for 3% of global electricity consumption – about the same as all of Switzerland or the Czech Republic -- according to people I met at the recent Open Compute Summit in San Jose. With increasing mobility at the edge of the cloud and vast new dataflows being predicted with the growth of the Internet of Things (and The Coming Age of Many Zettabytes) in the near...
The list of ‘new paradigm’ technologies that now surrounds us appears to be at an all time high. From cloud computing and Big Data analytics to Bring Your Own Device (BYOD) and the Internet of Things (IoT), today we have to deal with what the industry likes to call ‘paradigm shifts’ at every level of IT. This is disruption; of course, we understand that – change is almost always disruptive.
SYS-CON Events announced today that Cisco, the worldwide leader in IT that transforms how people connect, communicate and collaborate, has been named “Gold Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Cisco makes amazing things happen by connecting the unconnected. Cisco has shaped the future of the Internet by becoming the worldwide leader in transforming how people connect, communicate and collaborate. Cisco and our partners are building the platform for the Internet of Everything by connecting the...
SYS-CON Media announced today that @WebRTCSummit Blog, the largest WebRTC resource in the world, has been launched. @WebRTCSummit Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. @WebRTCSummit Blog can be bookmarked ▸ Here @WebRTCSummit conference site can be bookmarked ▸ Here
Temasys has announced senior management additions to its team. Joining are David Holloway as Vice President of Commercial and Nadine Yap as Vice President of Product. Over the past 12 months Temasys has doubled in size as it adds new customers and expands the development of its Skylink platform. Skylink leads the charge to move WebRTC, traditionally seen as a desktop, browser based technology, to become a ubiquitous web communications technology on web and mobile, as well as Internet of Things compatible devices.
SYS-CON Events announced today that robomq.io will exhibit at SYS-CON's @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. robomq.io is an interoperable and composable platform that connects any device to any application. It helps systems integrators and the solution providers build new and innovative products and service for industries requiring monitoring or intelligence from devices and sensors.
Docker is an excellent platform for organizations interested in running microservices. It offers portability and consistency between development and production environments, quick provisioning times, and a simple way to isolate services. In his session at DevOps Summit at 16th Cloud Expo, Shannon Williams, co-founder of Rancher Labs, will walk through these and other benefits of using Docker to run microservices, and provide an overview of RancherOS, a minimalist distribution of Linux designed expressly to run Docker. He will also discuss Rancher, an orchestration and service discovery platf...
Sonus Networks introduced the Sonus WebRTC Services Solution, a virtualized Web Real-Time Communications (WebRTC) offer, purpose-built for the Cloud. The WebRTC Services Solution provides signaling from WebRTC-to-WebRTC applications and interworking from WebRTC-to-Session Initiation Protocol (SIP), delivering advanced real-time communications capabilities on mobile applications and on websites, which are accessible via a browser.
SYS-CON Events announced today that Aria Systems, the leading innovator in recurring revenue, has been named “Bronze Sponsor” of SYS-CON's @ThingsExpo, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. Proven by the world’s most demanding enterprises, including AAA NCNU, Constant Contact, Falck, Hootsuite, Pitney Bowes, Telekom Denmark, and VMware, Aria helps enterprises grow their recurring revenue businesses. With Aria’s end-to-end active monetization platform, global brands can get to market faster with a wider variety of products and services, while maximizin...
SYS-CON Events announced today that Vitria Technology, Inc. will exhibit at SYS-CON’s @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Vitria will showcase the company’s new IoT Analytics Platform through live demonstrations at booth #330. Vitria’s IoT Analytics Platform, fully integrated and powered by an operational intelligence engine, enables customers to rapidly build and operationalize advanced analytics to deliver timely business outcomes for use cases across the industrial, enterprise, and consumer segments.
SYS-CON Events announced today that Akana, formerly SOA Software, has been named “Bronze Sponsor” of SYS-CON's 16th International Cloud Expo® New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. Akana’s comprehensive suite of API Management, API Security, Integrated SOA Governance, and Cloud Integration solutions helps businesses accelerate digital transformation by securely extending their reach across multiple channels – mobile, cloud and Internet of Things. Akana enables enterprises to share data as APIs, connect and integrate applications, drive part...
After making a doctor’s appointment via your mobile device, you receive a calendar invite. The day of your appointment, you get a reminder with the doctor’s location and contact information. As you enter the doctor’s exam room, the medical team is equipped with the latest tablet containing your medical history – he or she makes real time updates to your medical file. At the end of your visit, you receive an electronic prescription to your preferred pharmacy and can schedule your next appointment.
The WebRTC Summit 2014 New York, to be held June 9-11, 2015, at the Javits Center in New York, NY, announces that its Call for Papers is open. Topics include all aspects of improving IT delivery by eliminating waste through automated business models leveraging cloud technologies. WebRTC Summit is co-located with 16th International Cloud Expo, @ThingsExpo, Big Data Expo, and DevOps Summit.
SYS-CON Events announced today that Solgenia will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY, and the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Solgenia is the global market leader in Cloud Collaboration and Cloud Infrastructure software solutions. Designed to “Bridge the Gap” between Personal and Professional Social, Mobile and Cloud user experiences, our solutions help large and medium-sized organizations dr...
SYS-CON Events announced today that Liaison Technologies, a leading provider of data management and integration cloud services and solutions, has been named "Silver Sponsor" of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York, NY. Liaison Technologies is a recognized market leader in providing cloud-enabled data integration and data management solutions to break down complex information barriers, enabling enterprises to make smarter decisions, faster.
Cloud is not a commodity. And no matter what you call it, computing doesn’t come out of the sky. It comes from physical hardware inside brick and mortar facilities connected by hundreds of miles of networking cable. And no two clouds are built the same way. SoftLayer gives you the highest performing cloud infrastructure available. One platform that takes data centers around the world that are full of the widest range of cloud computing options, and then integrates and automates everything. Join SoftLayer on June 9 at 16th Cloud Expo to learn about IBM Cloud's SoftLayer platform, explore se...
The 3rd International Internet of @ThingsExpo, co-located with the 16th International Cloud Expo - to be held June 9-11, 2015, at the Javits Center in New York City, NY - announces that its Call for Papers is open. The Internet of Things (IoT) is the biggest idea since the creation of the Worldwide Web more than 20 years ago.
SYS-CON Events announced today that CommVault has been named “Bronze Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY, and the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. A singular vision – a belief in a better way to address current and future data management needs – guides CommVault in the development of Singular Information Management® solutions for high-performance data protection, universal availability and sim...
SYS-CON Media announced today that 9 out of 10 " most read" DevOps articles are published by @DevOpsSummit Blog. Launched in October 2014, @DevOpsSummit Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce softw...