Click here to close now.

Welcome!

Java Authors: Jason Bloomberg, Elizabeth White, Pat Romanski, Yeshim Deniz, Liz McMillan

Blog Feed Post

Incidental R

by Joseph Rickert Last week, I posted a list of sessions at the Joint Statistical Meetings related to R. As it turned out, that list was only the tip of the iceberg. In some areas of statistics, such as graphics, simulation and computational statistics the use of R is so prevalent that people working in the field often don't think to mention it. For example, in the session New Approaches to Data Exploration and Discovery which included the presentation on the Glassbox package that figured in my original list, R was important to the analyses underlying nearly all of the talks in one way or another. The following are synopses of the talks in that session along with some pointers to relevant R resources.  Exploring Huge Collections of Scatterplots Statistics and visualization legend Leland Wilkinson of Skytree showed off ScagExployer, a tool he built with Tuan Dang of the University of Illinois at Chicago to explore scagnostics (a contraction for “Scatter Plot Diagnostics” made up by John Hartigan and Paul Tukey in the 1980’s). ScagExployer makes it possible to look for anomalies and search for similar distributions in a huge collections of scatter plots. (The example Leland showed contained 124K plots).The ideas and many of the visuals for the talk can be found in the paper ScagExplorer: Exploring Scatterplots by Their Scagnostics. ScagExployer is Java based tool, but R users can work with the scagnostics package written by Lee Wilkinson and Anushka Anand in 2007. Glassbox: An R Package for Visualizing Algorithmic Models: Google’s Max Ghenis presented work he did with fellow Googlers Ben Ogorek; and Estevan Flores. Glassbox is an R application that attempts to provide transparency to “blackbox” algorithmic models such as Random Forests. Among other things, it calculates and plots the collective importance of groups of variables in such a model. The slides for the presentation are available, as is the package itself. Google is using predictive modeling and tools such as glassbox to better understand the characteristics of its workforce and to ask important, reflective questions such a “How can we better understand diversity?” The company also does HR modeling to see if what they know about people can give them a competitive edge in hiring. For example, Google uses data collected from people who have interviewed at the company in the past, but who have not received offers from Google, to try and understand Google's future hiring needs. The coolest thing about this presentation was that these guys work for the Human Resources Department! If you think that you work for a tech company go down to HR and see if you can get some help with Random Forests. A Web Application for Efficient Analysis of Peptide Libraries Eric Hare of Iowa State University introduced PeLica, work he did with colleagues Timo Sieber of University Medical Center Hamburg-Eppendorf and Heike Hofmann of Iowa State University. PeLica is an interactive, Shiny application to help assess the statistical properties of peptide libraries. PeLica’s creators refer to it as a Peptide Library Calculator that acts as a front end to the R package peptider which contains functions for evaluating the diversity of peptide libraries. The authors have done an exceptional job of using the documentation features available in Shiny to make their app a teaching tool. To Merge or Not to Merge: An Interactive Visualization Tool for Local Merges of Mixture Model Components Elizabeth Lorenzi of Carnegie Mellon showed the prototype for an interactive visualization tool that she is working on with Rebecca Nugent of Carnegie Mellon and Nema Dean of the University of Glasgow. The software calculates inter-component similarities of mixture model component trees and displays them as hierarchical dendrograms. Elizabeth and her colleagues are implementing this tool as an R package. An Interactive Visualization Platform for Interpreting Topic Models Carson Sievert of Iowa State University presented LDAvis, a general framework for visualizing topic models that he is building with Kenny Shirley of AT&T Labs. LDAvis is interactive R software that enables users to interpret and compare topics by highlighting keywords. The theory is nicely described in a recent paper, and the examples on Carson’s Github page are instructive and fun to play with. In this plot below, circle 26 representing a  topic has been selected. The bar chart on the right displays the 30 most relevant terms for this topic. The red bars represent the frequency of a term in a given topic, (proportional to p(term | topic)), and the gray bars represent a term's frequency across the entire corpus, (proportional to p(term)). Gravicom: A Web-Based Tool for Community Detection in Networks Andrea Kaplan showed off an interactive application that she and her Iowa State University team members, Heike Hofmann and  Daniel Nordman are building. GRavicom is an interactive web application based on Shiny and the D3 JavaScript library that lets a user manually collect nodes into clusters in a social network graph and then save this grouping information for subsequent processing. The idea is that eyeballing a large social network and selecting “obvious” groups may be an efficient way to initialize a machine learning algorithm. Have a look at a  Live demo. Human Factors Influencing Visual Statistical Inference Mahbubul Majumder of the University of Nebraska presented joint work done with Heike Hofmann and Dianne Cook, both of Iowa State University, on identifying key factors such as demographics, experience, training, of even the placement of figures in an array of plots, that may be important for the human analysis of visual data.     

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid

@ThingsExpo Stories
The 4th International Internet of @ThingsExpo, co-located with the 17th International Cloud Expo - to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA - announces that its Call for Papers is open. The Internet of Things (IoT) is the biggest idea since the creation of the Worldwide Web more than 20 years ago.
"People are a lot more knowledgeable about APIs now. There are two types of people who work with APIs - IT people who want to use APIs for something internal and the product managers who want to do something outside APIs for people to connect to them," explained Roberto Medrano, Executive Vice President at SOA Software, in this SYS-CON.tv interview at Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
The 17th International Cloud Expo has announced that its Call for Papers is open. 17th International Cloud Expo, to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, APM, APIs, Microservices, Security, Big Data, Internet of Things, DevOps and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal today!
In their session at @ThingsExpo, Shyam Varan Nath, Principal Architect at GE, and Ibrahim Gokcen, who leads GE's advanced IoT analytics, focused on the Internet of Things / Industrial Internet and how to make it operational for business end-users. Learn about the challenges posed by machine and sensor data and how to marry it with enterprise data. They also discussed the tips and tricks to provide the Industrial Internet as an end-user consumable service using Big Data Analytics and Industrial Cloud.
17th Cloud Expo, taking place Nov 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Meanwhile, 94% of enterprises are using some form of XaaS – software, platform, and infrastructure as a service.
Sensor-enabled things are becoming more commonplace, precursors to a larger and more complex framework that most consider the ultimate promise of the IoT: things connecting, interacting, sharing, storing, and over time perhaps learning and predicting based on habits, behaviors, location, preferences, purchases and more. In his session at @ThingsExpo, Tom Wesselman, Director of Communications Ecosystem Architecture at Plantronics, will examine the still nascent IoT as it is coalescing, including what it is today, what it might ultimately be, the role of wearable tech, and technology gaps stil...
The explosion of connected devices / sensors is creating an ever-expanding set of new and valuable data. In parallel the emerging capability of Big Data technologies to store, access, analyze, and react to this data is producing changes in business models under the umbrella of the Internet of Things (IoT). In particular within the Insurance industry, IoT appears positioned to enable deep changes by altering relationships between insurers, distributors, and the insured. In his session at @ThingsExpo, Michael Sick, a Senior Manager and Big Data Architect within Ernst and Young's Financial Servi...
The Workspace-as-a-Service (WaaS) market will grow to $6.4B by 2018. In his session at 16th Cloud Expo, Seth Bostock, CEO of IndependenceIT, will begin by walking the audience through the evolution of Workspace as-a-Service, where it is now vs. where it going. To look beyond the desktop we must understand exactly what WaaS is, who the users are, and where it is going in the future. IT departments, ISVs and service providers must look to workflow and automation capabilities to adapt to growing demand and the rapidly changing workspace model.
Since 2008 and for the first time in history, more than half of humans live in urban areas, urging cities to become “smart.” Today, cities can leverage the wide availability of smartphones combined with new technologies such as Beacons or NFC to connect their urban furniture and environment to create citizen-first services that improve transportation, way-finding and information delivery. In her session at @ThingsExpo, Laetitia Gazel-Anthoine, CEO of Connecthings, will focus on successful use cases.
One of the biggest impacts of the Internet of Things is and will continue to be on data; specifically data volume, management and usage. Companies are scrambling to adapt to this new and unpredictable data reality with legacy infrastructure that cannot handle the speed and volume of data. In his session at @ThingsExpo, Don DeLoach, CEO and president of Infobright, will discuss how companies need to rethink their data infrastructure to participate in the IoT, including: Data storage: Understanding the kinds of data: structured, unstructured, big/small? Analytics: What kinds and how responsiv...
Building low-cost wearable devices can enhance the quality of our lives. In his session at Internet of @ThingsExpo, Sai Yamanoor, Embedded Software Engineer at Altschool, provided an example of putting together a small keychain within a $50 budget that educates the user about the air quality in their surroundings. He also provided examples such as building a wearable device that provides transit or recreational information. He then reviewed the resources available to build wearable devices at home including open source hardware, the raw materials required and the options available to power s...
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo in Silicon Valley. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place Nov 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 17th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal an...
DevOps tends to focus on the relationship between Dev and Ops, putting an emphasis on the ops and application infrastructure. But that’s changing with microservices architectures. In her session at DevOps Summit, Lori MacVittie, Evangelist for F5 Networks, will focus on how microservices are changing the underlying architectures needed to scale, secure and deliver applications based on highly distributed (micro) services and why that means an expansion into “the network” for DevOps.
How do APIs and IoT relate? The answer is not as simple as merely adding an API on top of a dumb device, but rather about understanding the architectural patterns for implementing an IoT fabric. There are typically two or three trends: Exposing the device to a management framework Exposing that management framework to a business centric logic Exposing that business layer and data to end users. This last trend is the IoT stack, which involves a new shift in the separation of what stuff happens, where data lives and where the interface lies. For instance, it's a mix of architectural styles ...
The 3rd International @ThingsExpo, co-located with the 16th International Cloud Expo – to be held June 9-11, 2015, at the Javits Center in New York City, NY – is now accepting Hackathon proposals. Hackathon sponsorship benefits include general brand exposure and increasing engagement with the developer ecosystem. At Cloud Expo 2014 Silicon Valley, IBM held the Bluemix Developer Playground on November 5 and ElasticBox held the DevOps Hackathon on November 6. Both events took place on the expo floor. The Bluemix Developer Playground, for developers of all levels, highlighted the ease of use of...
We’re no longer looking to the future for the IoT wave. It’s no longer a distant dream but a reality that has arrived. It’s now time to make sure the industry is in alignment to meet the IoT growing pains – cooperate and collaborate as well as innovate. In his session at @ThingsExpo, Jim Hunter, Chief Scientist & Technology Evangelist at Greenwave Systems, will examine the key ingredients to IoT success and identify solutions to challenges the industry is facing. The deep industry expertise behind this presentation will provide attendees with a leading edge view of rapidly emerging IoT oppor...
Connected devices and the Internet of Things are getting significant momentum in 2014. In his session at Internet of @ThingsExpo, Jim Hunter, Chief Scientist & Technology Evangelist at Greenwave Systems, examined three key elements that together will drive mass adoption of the IoT before the end of 2015. The first element is the recent advent of robust open source protocols (like AllJoyn and WebRTC) that facilitate M2M communication. The second is broad availability of flexible, cost-effective storage designed to handle the massive surge in back-end data in a world where timely analytics is e...
We certainly live in interesting technological times. And no more interesting than the current competing IoT standards for connectivity. Various standards bodies, approaches, and ecosystems are vying for mindshare and positioning for a competitive edge. It is clear that when the dust settles, we will have new protocols, evolved protocols, that will change the way we interact with devices and infrastructure. We will also have evolved web protocols, like HTTP/2, that will be changing the very core of our infrastructures. At the same time, we have old approaches made new again like micro-services...
SYS-CON Events announced today that Gridstore™, the leader in hyper-converged infrastructure purpose-built to optimize Microsoft workloads, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Gridstore™ is the leader in hyper-converged infrastructure purpose-built for Microsoft workloads and designed to accelerate applications in virtualized environments. Gridstore’s hyper-converged infrastructure is the industry’s first all flash version of HyperConverged Appliances that include both compute and storag...
For years, we’ve relied too heavily on individual network functions or simplistic cloud controllers. However, they are no longer enough for today’s modern cloud data center. Businesses need a comprehensive platform architecture in order to deliver a complete networking suite for IoT environment based on OpenStack. In his session at @ThingsExpo, Dhiraj Sehgal from PLUMgrid will discuss what a holistic networking solution should really entail, and how to build a complete platform that is scalable, secure, agile and automated.