| By Srinivasan Sundara Rajan | Article Rating: |
|
| December 28, 2011 07:15 AM EST | Reads: |
4,726 |
In this article I would like to look at a few tools which are overlooked when it comes to Big Data analytics. Organizations that have already heavy investment on Mainframe and would like to continue with the utilization of Mainframe can consider these tools for further expanding their Big Data Analytics reach.
DFSORT- Sorting & Merging Large Data Sets :
- Much before RDBMS have taken their place, Cobol programs have 2 major file manipulation operations namely:
- SORT operation accepts un-sequenced input and produces output in specified sequence
- The Merge operation compares records from two or more files and combines them in order
- DFSORT adds the ability to do faster and easier sorting, merging, copying, reporting and analysis of your business information, as well as versatile data handling at the record, fixed position/length or variable position/length field, and bit level.
- DFSORT is designed to optimize the efficiency and speed with which operations are completed through synergy with processor, device, and system features
- A Cobol program will typically act as a intermediary in handling the FILE inputs and passing them to DFSORT
- After all the input records have been passed to DFSORT, the sorting operation is executed. This operation arranges the entire set of records in the sequence specified by keys.
- Much like a SORT , MERGE statement is also called from a COBOL job
- The MERGE statement execution begins the MERGE processing. This operation compares keys with the records of the input files, and passes the sequenced records to create a MERGED output file
- As per the documentation from the vendor , there is no maximum number of keys which can support the needs for Big Data Analytics processing
- Some of the advanced options of DFSORT also facilitates parallel sort processing which goes well with needs of Big Data Analytics
- With the work loads of Big Data Analytical jobs can span multiple physical and virtual servers including mainframe, it is good to see that DFSORT has the option to sort records either in EBCDIC or ASCII or another collating sequence. This can result in uniformity of massively parallel sorting jobs if they run on heterogeneous systems
- The Job Control Language (JCL), which gives Hadoop like management of large file processing jobs in Mainframe have good features to specify multiple input and output file options for SORT and MERGE jobs
- As evident this article does not aim as a tutorial for DFSORT and various performance features can be looked from Mainframe manuals or can ask Mainframe Gurus in your organization.
REXX :
- REXX (Restructured eXtended eXecutor) is another programming language that is used in the same eco system of Cobol and DFSORT and can considerably contribute to the Big Data Analytical needs of the enterprises
- REXX has advantages in string manipulation, Dynamic data typing, Storage Management and is generally considered to be very reliable and robust
- One of the most important strengths of REXX that is of relevance to Bigdata Analytics is its ‘'character string" handling ability.
- There are some useful string manipulation functions like COPIES (), WORDS(), STRIP(), TRANSLATE(), which can go a long way in the Map Reduce functionality needs of typical big data analytical jobs
- PARSE instruction is also used frequently in REXX programs. It is able to take strings from a number of sources and break them apart into constituent parts using a fairly natural notation
- Probably PARSE could be one of the highly useful feature of REXX in its positioning as a Big Data Analytical tool
- The REXX parse statement divides a source string into constituent parts and assigns these to symbols as directed by the governing parsing template
- REXX, DFSORT and Cobol programs can be inter operable such that we could call a REXX program from Cobol , and all these can be tied together with JCL
- Again this note is meant as a tutorial for REXX and lot of good documentation is available on utilizing the String manipulation features of REXX.
Summary : There is a strong need for enterprises to adopt Big Data Analytics and start mining the huge sets of unstructured data which has been ignored so far to arrive at meaningful business decisions. While newer frameworks like Hadoop or the new breed of analytical databases are going to satisfy this need, however enterprises should not be spending their time on picking up the tools and languages when it comes to Big Data Analytics.
If there is a significant investment and organization direction is to use the legacy platforms like Cobol, JCL, REXX, DFSORT it is only prudent to utilize best of their capabilities in arriving at options for Big Data Analytics.
We are seeing that Big Data Analytics is mainly dependent on Map / Reduce algorithms, these functions are aimed at crunching large data sets, like reading the input files and create key/value pair and map functions take these key/value pairs and generates another key/value pair. Further Reducer function also depends on sorted key/value pairs and iterate them and reduce the output further.
If we look at the way this logic works, there is a heavy need for sorting, merging, string manipulation and parsing all the way. Hence the tools mentioned above like DFSORT, REXX along with Cobol will likely to satisfy the Big Data needs of large enterprises if they have already invested on Mainframe compute capacity.
Published December 28, 2011 Reads 4,726
Copyright © 2011 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Srinivasan Sundara Rajan
Srinivasan Sundara Rajan (Also Known As Sundar) Is A Enterprise Technology Enabler for realizing business capabilities. His primary focus is enabling Agile Enterprises by facilitating the adoption of Every Thing As A Service Model with particular concentration on BpaaS (Business Process As A Service). He also helps enterprises in getting meaningful insights from their structured and unstructured and real time data sources. All the views expressed are Srinivasan's independent analysis of industry and solutions and need not necessarily be of his current or past organizations. Srinivasan would like to thank every one who augmented his Architectural skills with Analytical ideas.
- Cloud People: A Who's Who of Cloud Computing
- New Relic Q1 2013 Blazes Past Growth Targets and Reaches 40,000 Active Customer Accounts
- Cloud Expo New York: Delivering Digital Marketing on the Cloud
- Cloud Expo New York: Rethink IT and Reinvent Business with IBM SmartCloud
- Cloudant to Exhibit at Cloud Expo & Big Data Expo New York
- The Accessibility of the Cloud
- Cloud Expo NY: Best Practices for Delivering Oracle Database as a Service
- Cloud Expo New York: Basics of SSD Technology and Its Use in Cloud
- Session Topics: 12th Cloud Expo / Cloud Expo New York
- Cloud Expo New York: The Big Challenge of Big Data & Hadoop Integration
- Measuring the Business Value of Cloud Computing
- What CIOs Need to Know About Enterprise Virtualization
- Cloud People: A Who's Who of Cloud Computing
- Cloud Expo New York: Best CIO Practices Shared from SHI’s Customers
- Cloud Expo New York: How to Use Google Apps Script
- New Relic Q1 2013 Blazes Past Growth Targets and Reaches 40,000 Active Customer Accounts
- Cloud Expo New York: Why Big Data Is Really About Small Data
- Small Cancers, Big Data, and a Life Examined
- Cloud Expo New York: Delivering Digital Marketing on the Cloud
- Cloud Expo New York: Requirements of a Cloud Database
- Cloud Expo New York: Rethink IT and Reinvent Business with IBM SmartCloud
- Cloudant to Exhibit at Cloud Expo & Big Data Expo New York
- The Accessibility of the Cloud
- Learn How To Use Google Apps Script
- A Cup of AJAX? Nay, Just Regular Java Please
- Java Developer's Journal Exclusive: 2006 "JDJ Editors' Choice" Awards
- JavaServer Faces (JSF) vs Struts
- The i-Technology Right Stuff
- Rich Internet Applications with Adobe Flex 2 and Java
- Java vs C++ "Shootout" Revisited
- Bean-Managed Persistence Using a Proxy List
- Reporting Made Easy with JasperReports and Hibernate
- Creating a Pet Store Application with JavaServer Faces, Spring, and Hibernate
- Why Do 'Cool Kids' Choose Ruby or PHP to Build Websites Instead of Java?
- What's New in Eclipse?
- Where Are RIA Technologies Headed in 2008?





















