Welcome!

Java IoT Authors: Yeshim Deniz, Elizabeth White, Pat Romanski, Carmen Gonzalez, Liz McMillan

Blog Feed Post

An Introduction to SAS for R Programmers

by Joseph Rickert Life decisions are usually much too complicated to be attributed to any single cause, but one important reason that I am here at Revolution today is that I ignored suggestions from well-meaning faculty back in graduate school to work more in SAS rather than doing everything in R. There was a heavy emphasis on SAS then: the faculty were worried about us getting jobs. This was before the rise of the data scientist and the the corporate model my professors had in mind was: PhD statisticians do statistics and everyone else writes SAS code. I would not be surprised if this is still not the prevailing model in traditional Statistics programs. My bet is there are statisticians everywhere who have yet to come to grips with the concept of a “data scientist”.  Anyway, because of the great cosmic balance, or the bad karma that comes from ignoring well-intentioned advice and the fact that there are quite a few companies out there that want to convert their SAS code to R, I occasionally get to look at SAS code. In the process of interviewing candidates for this kind of work it struck me that there are many people coming to data science through the programming or machine learning routes who have some R knowledge as well as experience with Java, Python and C++ who have never worked with SAS. To this group I offer the following very brief “Introduction to SAS for R Programmers”. So what is SAS exactly? Originally, SAS  stood for “Statistical Analysis System”. Indeed, towards the beginning of his invaluable book, “R for SAS and SPSS Users”, Bob Muenchen characterizes SAS as a system for statistical computation that has five main components: A data management system for reading, transforming and organizing data (The Data Step) A large number of procedures (PROCs) for statistical analysis and graphics The Output Delivery System for extracting output from PROCs and customizing printed output A macro language for programming in the data step and calling PROCS The Interactive Matrix programming language (IML) for developing new algorithms SAS is not a single programming language. It is an entire ecosystem of products (not all seamlessly integrated) that contains at least two languages! While becoming a competent SAS programmer clearly requires mastering an impressive number of skills, quite a bit can be accomplished in SAS with a basic knowledge of the Data Step and the more common procedures (PROCs) in the base and Stat packages. Moreover, as it turns out, these two foundational components of SAS are the very two things that an R programmer is likely to find most strange about SAS. There is really only one data structure in SAS, a file with rows of observations and columns of variables that always gets processed by means of an implied loop. A Data Step “program” starts with the first row of a SAS file executes all of the code it encounters until it comes to a run; statement then looks at the second row of the file and runs through the code again. The Data Step proceeds sequentially through the entire file in this fashion. An excellent presentation from Steven J. First illustrates the process nicely. See slides 36 through 45 for an example of SAS code with a very clear PowerPoint animation of how this all works. It is true that SAS programmers can work with arrays, but this is actually a computational sleight of hand. Arrays are actually special columns in a data set. R programmers are used to an interactive computational experience. Within a session, at any point in time the objects that resulted from a previous computation are available as inputs to the next calculation. There is always a sense of moving forward. If you didn’t compute something as part of the last function you ran, just write another function and compute it now. In SAS, however, one uses the various PROCS to conjure the results in a methodical, premeditated way. For example, something like the following code would run a simple regression in SAS sending the results to the console. proc reg data = myData;model Y = X;run:  However, if you wanted to have the fitted values and residuals available for a further computation, you would have to rerun the regression specifying an output file and the keywords for computing the fitted values and residuals. proc reg DATA = myData;MODEL Y = X / stb clb;OUTPUT OUT=OUTREG P=PREDCIT R=RESED;run; Kathy Welch a statistical consultant at the University of Michigan, provides a very clear example of this linear way of working. Most SAS programming probably gets done by writing SAS macros. Look at Bob Muenchen’s book (or this article) for practical examples of R functions to replace SAS macros. For more advanced work,the SAS/Tool Kit (yet another add on) allows SAS probrammers to write custom procedures. But, from a R programmer’s perspective probably the most exciting SAS product is the IML System which provides the ability to call R from within an IML procedure. The documentation  provides an example of transferring data stored in SAS/IML vectors to R, running a model in R and then, importing the results back into SAS/IML vectors. Actually, if you are an R programmer, all you might really want to do is import data from SAS to R. Thre are at least five ways to do this using functions from various open source R libraries. (Note that some of these methods require preparation steps to be done in SAS.) The document “An Introduction to S and The Hmisc and Design Libraries” on CRAN is also helpful. However, I recommend using rxImport feature in RevoScaleR package that ships with Revolution R Enterprise. Importing a SAS file with rxImport looks like this: rxImport(inData=data,outFile="sasFileName") Not only is it a one step process that does not require having SAS installed on your system, but it reads .sas7bdat files directly into Revolution Analytics' .xdf file format. You can easily work with SAS files that are too large to fit into memory Once in .xdf file format the data can be worked on with RevoScaleR’s parallel external memory algorithms (PEMAs) or written to .csv files or data frames.

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid

@ThingsExpo Stories
SYS-CON Events announced today that Tappest will exhibit MooseFS at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. MooseFS is a breakthrough concept in the storage industry. It allows you to secure stored data with either duplication or erasure coding using any server. The newest – 4.0 version of the software enables users to maintain the redundancy level with even 50% less hard drive space required. The software func...
SYS-CON Events announced today that Interoute, owner-operator of one of Europe's largest networks and a global cloud services platform, has been named “Bronze Sponsor” of SYS-CON's 20th Cloud Expo, which will take place on June 6-8, 2017 at the Javits Center in New York, New York. Interoute is the owner-operator of one of Europe's largest networks and a global cloud services platform which encompasses 12 data centers, 14 virtual data centers and 31 colocation centers, with connections to 195 add...
SYS-CON Events announced today that EARP will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. "We are a software house, so we perfectly understand challenges that other software houses face in their projects. We can augment a team, that will work with the same standards and processes as our partners' internal teams. Our teams will deliver the same quality within the required time and budget just as our partn...
Most technology leaders, contemporary and from the hardware era, are reshaping their businesses to do software in the hope of capturing value in IoT. Although IoT is relatively new in the market, it has already gone through many promotional terms such as IoE, IoX, SDX, Edge/Fog, Mist Compute, etc. Ultimately, irrespective of the name, it is about deriving value from independent software assets participating in an ecosystem as one comprehensive solution.
SYS-CON Events announced today that delaPlex will exhibit at SYS-CON's @ThingsExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. delaPlex pioneered Software Development as a Service (SDaaS), which provides scalable resources to build, test, and deploy software. It’s a fast and more reliable way to develop a new product or expand your in-house team.
SYS-CON Events announced today that Systena America will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Systena Group has been in business for various software development and verification in Japan, US, ASEAN, and China by utilizing the knowledge we gained from all types of device development for various industries including smartphones (Android/iOS), wireless communication, security technology and IoT serv...
Amazon started as an online bookseller 20 years ago. Since then, it has evolved into a technology juggernaut that has disrupted multiple markets and industries and touches many aspects of our lives. It is a relentless technology and business model innovator driving disruption throughout numerous ecosystems. Amazon’s AWS revenues alone are approaching $16B a year making it one of the largest IT companies in the world. With dominant offerings in Cloud, IoT, eCommerce, Big Data, AI, Digital Assista...
SYS-CON Events announced today that Outscale will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Outscale's technology makes an automated and adaptable Cloud available to businesses, supporting them in the most complex IT projects while controlling their operational aspects. You boost your IT infrastructure's reactivity, with request responses that only take a few seconds.
Everywhere we turn in our industry we can find strong opinions about the direction, type and nature of cloud’s impact on computing and business. Another word that is used in every context in our industry is “hybrid.” In his session at 20th Cloud Expo, Alvaro Gonzalez, Director of Technical, Partner and Field Marketing at Peak 10, will use a combination of a few conceptual props and some research recently commissioned by Peak 10 to offer a real-world consideration of how the various categories of...
In his keynote at @ThingsExpo, Chris Matthieu, Director of IoT Engineering at Citrix and co-founder and CTO of Octoblu, focused on building an IoT platform and company. He provided a behind-the-scenes look at Octoblu’s platform, business, and pivots along the way (including the Citrix acquisition of Octoblu).
DevOps at Cloud Expo – being held October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real r...
Five years ago development was seen as a dead-end career, now it’s anything but – with an explosion in mobile and IoT initiatives increasing the demand for skilled engineers. But apart from having a ready supply of great coders, what constitutes true ‘DevOps Royalty’? It’ll be the ability to craft resilient architectures, supportability, security everywhere across the software lifecycle. In his keynote at @DevOpsSummit at 20th Cloud Expo, Jeffrey Scheaffer, GM and SVP, Continuous Delivery Busine...
SYS-CON Events announced today that Outscale, a global pure play Infrastructure as a Service provider and strategic partner of Dassault Systèmes, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Founded in 2010, Outscale simplifies infrastructure complexities and boosts the business agility of its customers. Outscale delivers a secure, reliable and industrial strength solution for its customers, which in...
SYS-CON Events announced today that CollabNet, a global leader in enterprise software development, release automation and DevOps solutions, will be a Bronze Sponsor of SYS-CON's 20th International Cloud Expo®, taking place from June 6-8, 2017, at the Javits Center in New York City, NY. CollabNet offers a broad range of solutions with the mission of helping modern organizations deliver quality software at speed. The company’s latest innovation, the DevOps Lifecycle Manager (DLM), supports Value S...
SYS-CON Events announced today that Peak 10, Inc., a national IT infrastructure and cloud services provider, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Peak 10 provides reliable, tailored data center and network services, cloud and managed services. Its solutions are designed to scale and adapt to customers’ changing business needs, enabling them to lower costs, improve performance and focus intern...
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
In order to meet the rapidly changing demands of today’s customers, companies are continually forced to redefine their business strategies in order to meet these needs, stay relevant and continue to see profitable growth. IoT deployment and development is integral in this transformation, and today businesses are increasingly seeing the value of investing their resources into IoT deployments. These technologies are able increase ROI through projects such as connecting supply chains or enabling sm...
In his opening keynote at 20th Cloud Expo, Michael Maximilien, Research Scientist, Architect, and Engineer at IBM, will motivate why realizing the full potential of the cloud and social data requires artificial intelligence. By mixing Cloud Foundry and the rich set of Watson services, IBM's Bluemix is the best cloud operating system for enterprises today, providing rapid development and deployment of applications that can take advantage of the rich catalog of Watson services to help drive insigh...
SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
SYS-CON Events announced today that Super Micro Computer, Inc., a global leader in compute, storage and networking technologies, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Supermicro (NASDAQ: SMCI), the leading innovator in high-performance, high-efficiency server technology, is a premier provider of advanced server Building Block Solutions® for Data Center, Cloud Computing, Enterprise IT, Hadoop/...