Welcome!

Java IoT Authors: AppDynamics Blog, Liz McMillan, Akhil Sahai, Elizabeth White, Dana Gardner

Blog Feed Post

An Introduction to SAS for R Programmers

by Joseph Rickert Life decisions are usually much too complicated to be attributed to any single cause, but one important reason that I am here at Revolution today is that I ignored suggestions from well-meaning faculty back in graduate school to work more in SAS rather than doing everything in R. There was a heavy emphasis on SAS then: the faculty were worried about us getting jobs. This was before the rise of the data scientist and the the corporate model my professors had in mind was: PhD statisticians do statistics and everyone else writes SAS code. I would not be surprised if this is still not the prevailing model in traditional Statistics programs. My bet is there are statisticians everywhere who have yet to come to grips with the concept of a “data scientist”.  Anyway, because of the great cosmic balance, or the bad karma that comes from ignoring well-intentioned advice and the fact that there are quite a few companies out there that want to convert their SAS code to R, I occasionally get to look at SAS code. In the process of interviewing candidates for this kind of work it struck me that there are many people coming to data science through the programming or machine learning routes who have some R knowledge as well as experience with Java, Python and C++ who have never worked with SAS. To this group I offer the following very brief “Introduction to SAS for R Programmers”. So what is SAS exactly? Originally, SAS  stood for “Statistical Analysis System”. Indeed, towards the beginning of his invaluable book, “R for SAS and SPSS Users”, Bob Muenchen characterizes SAS as a system for statistical computation that has five main components: A data management system for reading, transforming and organizing data (The Data Step) A large number of procedures (PROCs) for statistical analysis and graphics The Output Delivery System for extracting output from PROCs and customizing printed output A macro language for programming in the data step and calling PROCS The Interactive Matrix programming language (IML) for developing new algorithms SAS is not a single programming language. It is an entire ecosystem of products (not all seamlessly integrated) that contains at least two languages! While becoming a competent SAS programmer clearly requires mastering an impressive number of skills, quite a bit can be accomplished in SAS with a basic knowledge of the Data Step and the more common procedures (PROCs) in the base and Stat packages. Moreover, as it turns out, these two foundational components of SAS are the very two things that an R programmer is likely to find most strange about SAS. There is really only one data structure in SAS, a file with rows of observations and columns of variables that always gets processed by means of an implied loop. A Data Step “program” starts with the first row of a SAS file executes all of the code it encounters until it comes to a run; statement then looks at the second row of the file and runs through the code again. The Data Step proceeds sequentially through the entire file in this fashion. An excellent presentation from Steven J. First illustrates the process nicely. See slides 36 through 45 for an example of SAS code with a very clear PowerPoint animation of how this all works. It is true that SAS programmers can work with arrays, but this is actually a computational sleight of hand. Arrays are actually special columns in a data set. R programmers are used to an interactive computational experience. Within a session, at any point in time the objects that resulted from a previous computation are available as inputs to the next calculation. There is always a sense of moving forward. If you didn’t compute something as part of the last function you ran, just write another function and compute it now. In SAS, however, one uses the various PROCS to conjure the results in a methodical, premeditated way. For example, something like the following code would run a simple regression in SAS sending the results to the console. proc reg data = myData;model Y = X;run:  However, if you wanted to have the fitted values and residuals available for a further computation, you would have to rerun the regression specifying an output file and the keywords for computing the fitted values and residuals. proc reg DATA = myData;MODEL Y = X / stb clb;OUTPUT OUT=OUTREG P=PREDCIT R=RESED;run; Kathy Welch a statistical consultant at the University of Michigan, provides a very clear example of this linear way of working. Most SAS programming probably gets done by writing SAS macros. Look at Bob Muenchen’s book (or this article) for practical examples of R functions to replace SAS macros. For more advanced work,the SAS/Tool Kit (yet another add on) allows SAS probrammers to write custom procedures. But, from a R programmer’s perspective probably the most exciting SAS product is the IML System which provides the ability to call R from within an IML procedure. The documentation  provides an example of transferring data stored in SAS/IML vectors to R, running a model in R and then, importing the results back into SAS/IML vectors. Actually, if you are an R programmer, all you might really want to do is import data from SAS to R. Thre are at least five ways to do this using functions from various open source R libraries. (Note that some of these methods require preparation steps to be done in SAS.) The document “An Introduction to S and The Hmisc and Design Libraries” on CRAN is also helpful. However, I recommend using rxImport feature in RevoScaleR package that ships with Revolution R Enterprise. Importing a SAS file with rxImport looks like this: rxImport(inData=data,outFile="sasFileName") Not only is it a one step process that does not require having SAS installed on your system, but it reads .sas7bdat files directly into Revolution Analytics' .xdf file format. You can easily work with SAS files that are too large to fit into memory Once in .xdf file format the data can be worked on with RevoScaleR’s parallel external memory algorithms (PEMAs) or written to .csv files or data frames.

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid

@ThingsExpo Stories
Is your aging software platform suffering from technical debt while the market changes and demands new solutions at a faster clip? It’s a bold move, but you might consider walking away from your core platform and starting fresh. ReadyTalk did exactly that. In his General Session at 19th Cloud Expo, Michael Chambliss, Head of Engineering at ReadyTalk, will discuss why and how ReadyTalk diverted from healthy revenue and over a decade of audio conferencing product development to start an innovati...
Amazon has gradually rolled out parts of its IoT offerings in the last year, but these are just the tip of the iceberg. In addition to optimizing their back-end AWS offerings, Amazon is laying the ground work to be a major force in IoT – especially in the connected home and office. Amazon is extending its reach by building on its dominant Cloud IoT platform, its Dash Button strategy, recently announced Replenishment Services, the Echo/Alexa voice recognition control platform, the 6-7 strategic...
It’s 2016: buildings are smart, connected and the IoT is fundamentally altering how control and operating systems work and speak to each other. Platforms across the enterprise are networked via inexpensive sensors to collect massive amounts of data for analytics, information management, and insights that can be used to continuously improve operations. In his session at @ThingsExpo, Brian Chemel, Co-Founder and CTO of Digital Lumens, will explore: The benefits sensor-networked systems bring to ...
SYS-CON Events announced today that 910Telecom will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Housed in the classic Denver Gas & Electric Building, 910 15th St., 910Telecom is a carrier-neutral telecom hotel located in the heart of Denver. Adjacent to CenturyLink, AT&T, and Denver Main, 910Telecom offers connectivity to all major carriers, Internet service providers, Internet backbones and ...
SYS-CON Events announced today that Venafi, the Immune System for the Internet™ and the leading provider of Next Generation Trust Protection, will exhibit at @DevOpsSummit at 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Venafi is the Immune System for the Internet™ that protects the foundation of all cybersecurity – cryptographic keys and digital certificates – so they can’t be misused by bad guys in attacks...
There will be new vendors providing applications, middleware, and connected devices to support the thriving IoT ecosystem. This essentially means that electronic device manufacturers will also be in the software business. Many will be new to building embedded software or robust software. This creates an increased importance on software quality, particularly within the Industrial Internet of Things where business-critical applications are becoming dependent on products controlled by software. Qua...
SYS-CON Events has announced today that Roger Strukhoff has been named conference chair of Cloud Expo and @ThingsExpo 2016 Silicon Valley. The 19th Cloud Expo and 6th @ThingsExpo will take place on November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. "The Internet of Things brings trillions of dollars of opportunity to developers and enterprise IT, no matter how you measure it," stated Roger Strukhoff. "More importantly, it leverages the power of devices and the Interne...
Large scale deployments present unique planning challenges, system commissioning hurdles between IT and OT and demand careful system hand-off orchestration. In his session at @ThingsExpo, Jeff Smith, Senior Director and a founding member of Incenergy, will discuss some of the key tactics to ensure delivery success based on his experience of the last two years deploying Industrial IoT systems across four continents.
CenturyLink has announced that application server solutions from GENBAND are now available as part of CenturyLink’s Networx contracts. The General Services Administration (GSA)’s Networx program includes the largest telecommunications contract vehicles ever awarded by the federal government. CenturyLink recently secured an extension through spring 2020 of its offerings available to federal government agencies via GSA’s Networx Universal and Enterprise contracts. GENBAND’s EXPERiUS™ Application...
The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform. In his session at @ThingsExpo, Craig Sproule, CEO of Metavine, demonstrated how to move beyond today's coding paradigm and shared the must-have mindsets for removing complexity from the develo...
SYS-CON Events announced today that MangoApps will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. MangoApps provides modern company intranets and team collaboration software, allowing workers to stay connected and productive from anywhere in the world and from any device.
The IETF draft standard for M2M certificates is a security solution specifically designed for the demanding needs of IoT/M2M applications. In his session at @ThingsExpo, Brian Romansky, VP of Strategic Technology at TrustPoint Innovation, explained how M2M certificates can efficiently enable confidentiality, integrity, and authenticity on highly constrained devices.
The 19th International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Digital Transformation, Microservices and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportuni...
In today's uber-connected, consumer-centric, cloud-enabled, insights-driven, multi-device, global world, the focus of solutions has shifted from the product that is sold to the person who is buying the product or service. Enterprises have rebranded their business around the consumers of their products. The buyer is the person and the focus is not on the offering. The person is connected through multiple devices, wearables, at home, on the road, and in multiple locations, sometimes simultaneously...
“delaPlex Software provides software outsourcing services. We have a hybrid model where we have onshore developers and project managers that we can place anywhere in the U.S. or in Europe,” explained Manish Sachdeva, CEO at delaPlex Software, in this SYS-CON.tv interview at @ThingsExpo, held June 7-9, 2016, at the Javits Center in New York City, NY.
From wearable activity trackers to fantasy e-sports, data and technology are transforming the way athletes train for the game and fans engage with their teams. In his session at @ThingsExpo, will present key data findings from leading sports organizations San Francisco 49ers, Orlando Magic NBA team. By utilizing data analytics these sports orgs have recognized new revenue streams, doubled its fan base and streamlined costs at its stadiums. John Paul is the CEO and Founder of VenueNext. Prior ...
"We've discovered that after shows 80% if leads that people get, 80% of the conversations end up on the show floor, meaning people forget about it, people forget who they talk to, people forget that there are actual business opportunities to be had here so we try to help out and keep the conversations going," explained Jeff Mesnik, Founder and President of ContentMX, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
Internet of @ThingsExpo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with the 19th International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world and ThingsExpo Silicon Valley Call for Papers is now open.
The IoT is changing the way enterprises conduct business. In his session at @ThingsExpo, Eric Hoffman, Vice President at EastBanc Technologies, discussed how businesses can gain an edge over competitors by empowering consumers to take control through IoT. He cited examples such as a Washington, D.C.-based sports club that leveraged IoT and the cloud to develop a comprehensive booking system. He also highlighted how IoT can revitalize and restore outdated business models, making them profitable ...
With 15% of enterprises adopting a hybrid IT strategy, you need to set a plan to integrate hybrid cloud throughout your infrastructure. In his session at 18th Cloud Expo, Steven Dreher, Director of Solutions Architecture at Green House Data, discussed how to plan for shifting resource requirements, overcome challenges, and implement hybrid IT alongside your existing data center assets. Highlights included anticipating workload, cost and resource calculations, integrating services on both sides...