Welcome!

Java Authors: Jim Kaskade, Kevin Benedict, Elizabeth White, Carmen Gonzalez, Lori MacVittie

Blog Feed Post

Incidental R

by Joseph Rickert Last week, I posted a list of sessions at the Joint Statistical Meetings related to R. As it turned out, that list was only the tip of the iceberg. In some areas of statistics, such as graphics, simulation and computational statistics the use of R is so prevalent that people working in the field often don't think to mention it. For example, in the session New Approaches to Data Exploration and Discovery which included the presentation on the Glassbox package that figured in my original list, R was important to the analyses underlying nearly all of the talks in one way or another. The following are synopses of the talks in that session along with some pointers to relevant R resources.  Exploring Huge Collections of Scatterplots Statistics and visualization legend Leland Wilkinson of Skytree showed off ScagExployer, a tool he built with Tuan Dang of the University of Illinois at Chicago to explore scagnostics (a contraction for “Scatter Plot Diagnostics” made up by John Hartigan and Paul Tukey in the 1980’s). ScagExployer makes it possible to look for anomalies and search for similar distributions in a huge collections of scatter plots. (The example Leland showed contained 124K plots).The ideas and many of the visuals for the talk can be found in the paper ScagExplorer: Exploring Scatterplots by Their Scagnostics. ScagExployer is Java based tool, but R users can work with the scagnostics package written by Lee Wilkinson and Anushka Anand in 2007. Glassbox: An R Package for Visualizing Algorithmic Models: Google’s Max Ghenis presented work he did with fellow Googlers Ben Ogorek; and Estevan Flores. Glassbox is an R application that attempts to provide transparency to “blackbox” algorithmic models such as Random Forests. Among other things, it calculates and plots the collective importance of groups of variables in such a model. The slides for the presentation are available, as is the package itself. Google is using predictive modeling and tools such as glassbox to better understand the characteristics of its workforce and to ask important, reflective questions such a “How can we better understand diversity?” The company also does HR modeling to see if what they know about people can give them a competitive edge in hiring. For example, Google uses data collected from people who have interviewed at the company in the past, but who have not received offers from Google, to try and understand Google's future hiring needs. The coolest thing about this presentation was that these guys work for the Human Resources Department! If you think that you work for a tech company go down to HR and see if you can get some help with Random Forests. A Web Application for Efficient Analysis of Peptide Libraries Eric Hare of Iowa State University introduced PeLica, work he did with colleagues Timo Sieber of University Medical Center Hamburg-Eppendorf and Heike Hofmann of Iowa State University. PeLica is an interactive, Shiny application to help assess the statistical properties of peptide libraries. PeLica’s creators refer to it as a Peptide Library Calculator that acts as a front end to the R package peptider which contains functions for evaluating the diversity of peptide libraries. The authors have done an exceptional job of using the documentation features available in Shiny to make their app a teaching tool. To Merge or Not to Merge: An Interactive Visualization Tool for Local Merges of Mixture Model Components Elizabeth Lorenzi of Carnegie Mellon showed the prototype for an interactive visualization tool that she is working on with Rebecca Nugent of Carnegie Mellon and Nema Dean of the University of Glasgow. The software calculates inter-component similarities of mixture model component trees and displays them as hierarchical dendrograms. Elizabeth and her colleagues are implementing this tool as an R package. An Interactive Visualization Platform for Interpreting Topic Models Carson Sievert of Iowa State University presented LDAvis, a general framework for visualizing topic models that he is building with Kenny Shirley of AT&T Labs. LDAvis is interactive R software that enables users to interpret and compare topics by highlighting keywords. The theory is nicely described in a recent paper, and the examples on Carson’s Github page are instructive and fun to play with. In this plot below, circle 26 representing a  topic has been selected. The bar chart on the right displays the 30 most relevant terms for this topic. The red bars represent the frequency of a term in a given topic, (proportional to p(term | topic)), and the gray bars represent a term's frequency across the entire corpus, (proportional to p(term)). Gravicom: A Web-Based Tool for Community Detection in Networks Andrea Kaplan showed off an interactive application that she and her Iowa State University team members, Heike Hofmann and  Daniel Nordman are building. GRavicom is an interactive web application based on Shiny and the D3 JavaScript library that lets a user manually collect nodes into clusters in a social network graph and then save this grouping information for subsequent processing. The idea is that eyeballing a large social network and selecting “obvious” groups may be an efficient way to initialize a machine learning algorithm. Have a look at a  Live demo. Human Factors Influencing Visual Statistical Inference Mahbubul Majumder of the University of Nebraska presented joint work done with Heike Hofmann and Dianne Cook, both of Iowa State University, on identifying key factors such as demographics, experience, training, of even the placement of figures in an array of plots, that may be important for the human analysis of visual data.     

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid

@ThingsExpo Stories
Cultural, regulatory, environmental, political and economic (CREPE) conditions over the past decade are creating cross-industry solution spaces that require processes and technologies from both the Internet of Things (IoT), and Data Management and Analytics (DMA). These solution spaces are evolving into Sensor Analytics Ecosystems (SAE) that represent significant new opportunities for organizations of all types. Public Utilities throughout the world, providing electricity, natural gas and water, are pursuing SmartGrid initiatives that represent one of the more mature examples of SAE. We have s...
The Internet of Things (IoT) is going to require a new way of thinking and of developing software for speed, security and innovation. This requires IT leaders to balance business as usual while anticipating for the next market and technology trends. Cloud provides the right IT asset portfolio to help today’s IT leaders manage the old and prepare for the new. Today the cloud conversation is evolving from private and public to hybrid. This session will provide use cases and insights to reinforce the value of the network in helping organizations to maximize their company’s cloud experience.
IoT is still a vague buzzword for many people. In his session at Internet of @ThingsExpo, Mike Kavis, Vice President & Principal Cloud Architect at Cloud Technology Partners, will discuss the business value of IoT that goes far beyond the general public's perception that IoT is all about wearables and home consumer services. The presentation will also discuss how IoT is perceived by investors and how venture capitalist access this space. Other topics to discuss are barriers to success, what is new, what is old, and what the future may hold.
Whether you're a startup or a 100 year old enterprise, the Internet of Things offers a variety of new capabilities for your business. IoT style solutions can help you get closer your customers, launch new product lines and take over an industry. Some companies are dipping their toes in, but many have already taken the plunge, all while dramatic new capabilities continue to emerge. In his session at Internet of @ThingsExpo, Reid Carlberg, Senior Director, Developer Evangelism at salesforce.com, to discuss real-world use cases, patterns and opportunities you can harness today.
All major researchers estimate there will be tens of billions devices – computers, smartphones, tablets, and sensors – connected to the Internet by 2020. This number will continue to grow at a rapid pace for the next several decades. With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo in Silicon Valley. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be!
Noted IoT expert and researcher Joseph di Paolantonio (pictured below) has joined the @ThingsExpo faculty. Joseph, who describes himself as an “Independent Thinker” from DataArchon, will speak on the topic of “Smart Grids & Managing Big Utilities.” Over his career, Joseph di Paolantonio has worked in the energy, renewables, aerospace, telecommunications, and information technology industries. His expertise is in data analysis, system engineering, Bayesian statistics, data warehouses, business intelligence, data mining, predictive methods, and very large databases (VLDB). Prior to DataArcho...
Software AG helps organizations transform into Digital Enterprises, so they can differentiate from competitors and better engage customers, partners and employees. Using the Software AG Suite, companies can close the gap between business and IT to create digital systems of differentiation that drive front-line agility. We offer four on-ramps to the Digital Enterprise: alignment through collaborative process analysis; transformation through portfolio management; agility through process automation and integration; and visibility through intelligent business operations and big data.
There will be 50 billion Internet connected devices by 2020. Today, every manufacturer has a propriety protocol and an app. How do we securely integrate these "things" into our lives and businesses in a way that we can easily control and manage? Even better, how do we integrate these "things" so that they control and manage each other so our lives become more convenient or our businesses become more profitable and/or safe? We have heard that the best interface is no interface. In his session at Internet of @ThingsExpo, Chris Matthieu, Co-Founder & CTO at Octoblu, Inc., will discuss how thes...
Last week, while in San Francisco, I used the Uber app and service four times. All four experiences were great, although one of the drivers stopped for 30 seconds and then left as I was walking up to the car. He must have realized I was a blogger. None the less, the next car was just a minute away and I suffered no pain. In this article, my colleague, Ved Sen, Global Head, Advisory Services Social, Mobile and Sensors at Cognizant shares his experiences and insights.
We are reaching the end of the beginning with WebRTC and real systems using this technology have begun to appear. One challenge that faces every WebRTC deployment (in some form or another) is identity management. For example, if you have an existing service – possibly built on a variety of different PaaS/SaaS offerings – and you want to add real-time communications you are faced with a challenge relating to user management, authentication, authorization, and validation. Service providers will want to use their existing identities, but these will have credentials already that are (hopefully) ir...
Can call centers hang up the phones for good? Intuitive Solutions did. WebRTC enabled this contact center provider to eliminate antiquated telephony and desktop phone infrastructure with a pure web-based solution, allowing them to expand beyond brick-and-mortar confines to a home-based agent model. It also ensured scalability and better service for customers, including MUY! Companies, one of the country's largest franchise restaurant companies with 232 Pizza Hut locations. This is one example of WebRTC adoption today, but the potential is limitless when powered by IoT. Attendees will learn rea...
From telemedicine to smart cars, digital homes and industrial monitoring, the explosive growth of IoT has created exciting new business opportunities for real time calls and messaging. In his session at Internet of @ThingsExpo, Ivelin Ivanov, CEO and Co-Founder of Telestax, will share some of the new revenue sources that IoT created for Restcomm – the open source telephony platform from Telestax. Ivelin Ivanov is a technology entrepreneur who founded Mobicents, an Open Source VoIP Platform, to help create, deploy, and manage applications integrating voice, video and data. He is the co-founder ...
The Internet of Things (IoT) promises to create new business models as significant as those that were inspired by the Internet and the smartphone 20 and 10 years ago. What business, social and practical implications will this phenomenon bring? That's the subject of "Monetizing the Internet of Things: Perspectives from the Front Lines," an e-book released today and available free of charge from Aria Systems, the leading innovator in recurring revenue management.
The Internet of Things will put IT to its ultimate test by creating infinite new opportunities to digitize products and services, generate and analyze new data to improve customer satisfaction, and discover new ways to gain a competitive advantage across nearly every industry. In order to help corporate business units to capitalize on the rapidly evolving IoT opportunities, IT must stand up to a new set of challenges.
There’s Big Data, then there’s really Big Data from the Internet of Things. IoT is evolving to include many data possibilities like new types of event, log and network data. The volumes are enormous, generating tens of billions of logs per day, which raise data challenges. Early IoT deployments are relying heavily on both the cloud and managed service providers to navigate these challenges. In her session at 6th Big Data Expo®, Hannah Smalltree, Director at Treasure Data, to discuss how IoT, Big Data and deployments are processing massive data volumes from wearables, utilities and other mach...
P2P RTC will impact the landscape of communications, shifting from traditional telephony style communications models to OTT (Over-The-Top) cloud assisted & PaaS (Platform as a Service) communication services. The P2P shift will impact many areas of our lives, from mobile communication, human interactive web services, RTC and telephony infrastructure, user federation, security and privacy implications, business costs, and scalability. In his session at Internet of @ThingsExpo, Erik Lagerway, Co-founder of Hookflash, will walk through the shifting landscape of traditional telephone and voice s...
While great strides have been made relative to the video aspects of remote collaboration, audio technology has basically stagnated. Typically all audio is mixed to a single monaural stream and emanates from a single point, such as a speakerphone or a speaker associated with a video monitor. This leads to confusion and lack of understanding among participants especially regarding who is actually speaking. Spatial teleconferencing introduces the concept of acoustic spatial separation between conference participants in three dimensional space. This has been shown to significantly improve comprehe...
The Internet of Things is tied together with a thin strand that is known as time. Coincidentally, at the core of nearly all data analytics is a timestamp. When working with time series data there are a few core principles that everyone should consider, especially across datasets where time is the common boundary. In his session at Internet of @ThingsExpo, Jim Scott, Director of Enterprise Strategy & Architecture at MapR Technologies, will discuss single-value, geo-spatial, and log time series data. By focusing on enterprise applications and the data center, he will use OpenTSDB as an example...
SYS-CON Events announced today that Gridstore™, the leader in software-defined storage (SDS) purpose-built for Windows Servers and Hyper-V, will exhibit at SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Gridstore™ is the leader in software-defined storage purpose built for virtualization that is designed to accelerate applications in virtualized environments. Using its patented Server-Side Virtual Controller™ Technology (SVCT) to eliminate the I/O blender effect and accelerate applications Gridsto...
The Transparent Cloud-computing Consortium (abbreviation: T-Cloud Consortium) will conduct research activities into changes in the computing model as a result of collaboration between "device" and "cloud" and the creation of new value and markets through organic data processing High speed and high quality networks, and dramatic improvements in computer processing capabilities, have greatly changed the nature of applications and made the storing and processing of data on the network commonplace. These technological reforms have not only changed computers and smartphones, but are also changi...