Click here to close now.

Welcome!

Java Authors: Pat Romanski, Leo Reiter, Elizabeth White, Liz McMillan, Yeshim Deniz

Blog Feed Post

An Introduction to SAS for R Programmers

by Joseph Rickert Life decisions are usually much too complicated to be attributed to any single cause, but one important reason that I am here at Revolution today is that I ignored suggestions from well-meaning faculty back in graduate school to work more in SAS rather than doing everything in R. There was a heavy emphasis on SAS then: the faculty were worried about us getting jobs. This was before the rise of the data scientist and the the corporate model my professors had in mind was: PhD statisticians do statistics and everyone else writes SAS code. I would not be surprised if this is still not the prevailing model in traditional Statistics programs. My bet is there are statisticians everywhere who have yet to come to grips with the concept of a “data scientist”.  Anyway, because of the great cosmic balance, or the bad karma that comes from ignoring well-intentioned advice and the fact that there are quite a few companies out there that want to convert their SAS code to R, I occasionally get to look at SAS code. In the process of interviewing candidates for this kind of work it struck me that there are many people coming to data science through the programming or machine learning routes who have some R knowledge as well as experience with Java, Python and C++ who have never worked with SAS. To this group I offer the following very brief “Introduction to SAS for R Programmers”. So what is SAS exactly? Originally, SAS  stood for “Statistical Analysis System”. Indeed, towards the beginning of his invaluable book, “R for SAS and SPSS Users”, Bob Muenchen characterizes SAS as a system for statistical computation that has five main components: A data management system for reading, transforming and organizing data (The Data Step) A large number of procedures (PROCs) for statistical analysis and graphics The Output Delivery System for extracting output from PROCs and customizing printed output A macro language for programming in the data step and calling PROCS The Interactive Matrix programming language (IML) for developing new algorithms SAS is not a single programming language. It is an entire ecosystem of products (not all seamlessly integrated) that contains at least two languages! While becoming a competent SAS programmer clearly requires mastering an impressive number of skills, quite a bit can be accomplished in SAS with a basic knowledge of the Data Step and the more common procedures (PROCs) in the base and Stat packages. Moreover, as it turns out, these two foundational components of SAS are the very two things that an R programmer is likely to find most strange about SAS. There is really only one data structure in SAS, a file with rows of observations and columns of variables that always gets processed by means of an implied loop. A Data Step “program” starts with the first row of a SAS file executes all of the code it encounters until it comes to a run; statement then looks at the second row of the file and runs through the code again. The Data Step proceeds sequentially through the entire file in this fashion. An excellent presentation from Steven J. First illustrates the process nicely. See slides 36 through 45 for an example of SAS code with a very clear PowerPoint animation of how this all works. It is true that SAS programmers can work with arrays, but this is actually a computational sleight of hand. Arrays are actually special columns in a data set. R programmers are used to an interactive computational experience. Within a session, at any point in time the objects that resulted from a previous computation are available as inputs to the next calculation. There is always a sense of moving forward. If you didn’t compute something as part of the last function you ran, just write another function and compute it now. In SAS, however, one uses the various PROCS to conjure the results in a methodical, premeditated way. For example, something like the following code would run a simple regression in SAS sending the results to the console. proc reg data = myData;model Y = X;run:  However, if you wanted to have the fitted values and residuals available for a further computation, you would have to rerun the regression specifying an output file and the keywords for computing the fitted values and residuals. proc reg DATA = myData;MODEL Y = X / stb clb;OUTPUT OUT=OUTREG P=PREDCIT R=RESED;run; Kathy Welch a statistical consultant at the University of Michigan, provides a very clear example of this linear way of working. Most SAS programming probably gets done by writing SAS macros. Look at Bob Muenchen’s book (or this article) for practical examples of R functions to replace SAS macros. For more advanced work,the SAS/Tool Kit (yet another add on) allows SAS probrammers to write custom procedures. But, from a R programmer’s perspective probably the most exciting SAS product is the IML System which provides the ability to call R from within an IML procedure. The documentation  provides an example of transferring data stored in SAS/IML vectors to R, running a model in R and then, importing the results back into SAS/IML vectors. Actually, if you are an R programmer, all you might really want to do is import data from SAS to R. Thre are at least five ways to do this using functions from various open source R libraries. (Note that some of these methods require preparation steps to be done in SAS.) The document “An Introduction to S and The Hmisc and Design Libraries” on CRAN is also helpful. However, I recommend using rxImport feature in RevoScaleR package that ships with Revolution R Enterprise. Importing a SAS file with rxImport looks like this: rxImport(inData=data,outFile="sasFileName") Not only is it a one step process that does not require having SAS installed on your system, but it reads .sas7bdat files directly into Revolution Analytics' .xdf file format. You can easily work with SAS files that are too large to fit into memory Once in .xdf file format the data can be worked on with RevoScaleR’s parallel external memory algorithms (PEMAs) or written to .csv files or data frames.

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid

@ThingsExpo Stories
GENBAND has announced that SageNet is leveraging the Nuvia platform to deliver Unified Communications as a Service (UCaaS) to its large base of retail and enterprise customers. Nuvia’s cloud-based solution provides SageNet’s customers with a full suite of business communications and collaboration tools. Two large national SageNet retail customers have recently signed up to deploy the Nuvia platform and the company will continue to sell the service to new and existing customers. Nuvia’s capabilities include HD voice, video, multimedia messaging, mobility, conferencing, Web collaboration, deskt...
Sonus Networks introduced the Sonus WebRTC Services Solution, a virtualized Web Real-Time Communications (WebRTC) offer, purpose-built for the Cloud. The WebRTC Services Solution provides signaling from WebRTC-to-WebRTC applications and interworking from WebRTC-to-Session Initiation Protocol (SIP), delivering advanced real-time communications capabilities on mobile applications and on websites, which are accessible via a browser.
Temasys has announced senior management additions to its team. Joining are David Holloway as Vice President of Commercial and Nadine Yap as Vice President of Product. Over the past 12 months Temasys has doubled in size as it adds new customers and expands the development of its Skylink platform. Skylink leads the charge to move WebRTC, traditionally seen as a desktop, browser based technology, to become a ubiquitous web communications technology on web and mobile, as well as Internet of Things compatible devices.
SYS-CON Events announced today that AIC, a leading provider of OEM/ODM server and storage solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. AIC is a leading provider of both standard OTS, off-the-shelf, and OEM/ODM server and storage solutions. With expert in-house design capabilities, validation, manufacturing and production, AIC's broad selection of products are highly flexible and are configurable to any form factor or custom configuration. AIC leads the industry with nearly 20 years of ...
“With easy-to-use SDKs for Atmel’s platforms, IoT developers can now reap the benefits of realtime communication, and bypass the security pitfalls and configuration complexities that put IoT deployments at risk,” said Todd Greene, founder & CEO of PubNub. PubNub will team with Atmel at CES 2015 to launch full SDK support for Atmel’s MCU, MPU, and Wireless SoC platforms. Atmel developers now have access to PubNub’s secure Publish/Subscribe messaging with guaranteed ¼ second latencies across PubNub’s 14 global points-of-presence. PubNub delivers secure communication through firewalls, proxy ser...
SYS-CON Events announced today that Vicom Computer Services, Inc., a provider of technology and service solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. They are located at booth #427. Vicom Computer Services, Inc. is a progressive leader in the technology industry for over 30 years. Headquartered in the NY Metropolitan area. Vicom provides products and services based on today’s requirements around Unified Networks, Cloud Computing strategies, Virtualization around Software defined Data Ce...
SYS-CON Events announced today that Gridstore™, the leader in hyper-converged infrastructure purpose-built to optimize Microsoft workloads, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Gridstore™ is the leader in hyper-converged infrastructure purpose-built for Microsoft workloads and designed to accelerate applications in virtualized environments. Gridstore’s hyper-converged infrastructure is the industry’s first all flash version of HyperConverged Appliances that include both compute and storag...
Chuck Piluso will present a study of cloud adoption trends and the power and flexibility of IBM Power and Pureflex cloud solutions. Speaker Bio: Prior to Data Storage Corporation (DSC), Mr. Piluso founded North American Telecommunication Corporation, a facilities-based Competitive Local Exchange Carrier licensed by the Public Service Commission in 10 states, serving as the company's chairman and president from 1997 to 2000. Between 1990 and 1997, Mr. Piluso served as chairman & founder of International Telecommunications Corporation, a facilities-based international carrier licensed by t...
There are lots of challenges in IoT around secure, scalable and business friendly infrastructure for enterprises. For large corporations, IoT implementations are one of the top priorities of the decade. All industries are seeing a competitive need to sustain by investing in IoT initiatives. The value addition comes from improved customer service, innovative product and additional revenue streams. The data from these IP-connected devices can be leveraged for a variety of business applications as well as responsive action controls. The various architectural building blocks of an IoT ...
“In the past year we've seen a lot of stabilization of WebRTC. You can now use it in production with a far greater degree of certainty. A lot of the real developments in the past year have been in things like the data channel, which will enable a whole new type of application," explained Peter Dunkley, Technical Director at Acision, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
WebRTC is an up-and-coming standard that enables real-time voice and video to be directly embedded into browsers making the browser a primary user interface for communications and collaboration. WebRTC runs in a number of browsers today and is currently supported in over a billion installed browsers globally, across a range of platform OS and devices. Today, organizations that choose to deploy WebRTC applications and use a host machine that supports audio through USB or Bluetooth can use Plantronics products to connect and transit or receive the audio associated with the WebRTC session.
The best mobile applications are augmented by dedicated servers, the Internet and Cloud services. Mobile developers should focus on one thing: writing the next socially disruptive viral app. Thanks to the cloud, they can focus on the overall solution, not the underlying plumbing. From iOS to Android and Windows, developers can leverage cloud services to create a common cross-platform backend to persist user settings, app data, broadcast notifications, run jobs, etc. This session provides a high level technical overview of many cloud services available to mobile app developers, includi...
SYS-CON Media announced today that @WebRTCSummit Blog, the largest WebRTC resource in the world, has been launched. @WebRTCSummit Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. @WebRTCSummit Blog can be bookmarked ▸ Here @WebRTCSummit conference site can be bookmarked ▸ Here
SYS-CON Events announced today that Ciqada will exhibit at SYS-CON's @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Ciqada™ makes it easy to connect your products to the Internet. By integrating key components - hardware, servers, dashboards, and mobile apps - into an easy-to-use, configurable system, your products can quickly and securely join the internet of things. With remote monitoring, control, and alert messaging capability, you will meet your customers' needs of tomorrow - today! Ciqada. Let your products take flight. For more inform...
Health care systems across the globe are under enormous strain, as facilities reach capacity and costs continue to rise. M2M and the Internet of Things have the potential to transform the industry through connected health solutions that can make care more efficient while reducing costs. In fact, Vodafone's annual M2M Barometer Report forecasts M2M applications rising to 57 percent in health care and life sciences by 2016. Lively is one of Vodafone's health care partners, whose solutions enable older adults to live independent lives while staying connected to loved ones. M2M will continue to gr...
Dave will share his insights on how Internet of Things for Enterprises are transforming and making more productive and efficient operations and maintenance (O&M) procedures in the cleantech industry and beyond. Speaker Bio: Dave Landa is chief operating officer of Cybozu Corp (kintone US). Based in the San Francisco Bay Area, Dave has been on the forefront of the Cloud revolution driving strategic business development on the executive teams of multiple leading Software as a Services (SaaS) application providers dating back to 2004. Cybozu's kintone.com is a leading global BYOA (Build Your O...
As enterprises move to all-IP networks and cloud-based applications, communications service providers (CSPs) – facing increased competition from over-the-top providers delivering content via the Internet and independently of CSPs – must be able to offer seamless cloud-based communication and collaboration solutions that can scale for small, midsize, and large enterprises, as well as public sector organizations, in order to keep and grow market share. The latest version of Oracle Communications Unified Communications Suite gives CSPs the capability to do just that. In addition, its integration ...
The 17th International Cloud Expo has announced that its Call for Papers is open. 17th International Cloud Expo, to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, APM, APIs, Microservices, Security, Big Data, Internet of Things, DevOps and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal today!
While not quite mainstream yet, WebRTC is starting to gain ground with Carriers, Enterprises and Independent Software Vendors (ISV’s) alike. WebRTC makes it easy for developers to add audio and video communications into their applications by using Web browsers as their platform. But like any market, every customer engagement has unique requirements, as well as constraints. And of course, one size does not fit all. In her session at WebRTC Summit, Dr. Natasha Tamaskar, Vice President, Head of Cloud and Mobile Strategy at GENBAND, will explore what is needed to take a real time communications ...
The IoT Bootcamp is coming to Cloud Expo | @ThingsExpo on June 9-10 at the Javits Center in New York. Instructor. Registration is now available at http://iotbootcamp.sys-con.com/ Instructor Janakiram MSV previously taught the famously successful Multi-Cloud Bootcamp at Cloud Expo | @ThingsExpo in November in Santa Clara. Now he is expanding the focus to Janakiram is the founder and CTO of Get Cloud Ready Consulting, a niche Cloud Migration and Cloud Operations firm that recently got acquired by Aditi Technologies. He is a Microsoft Regional Director for Hyderabad, India, and one of the f...