Java IoT Authors: Zakia Bouachraoui, Liz McMillan, Elizabeth White, Pat Romanski, Yeshim Deniz

Related Topics: @DXWorldExpo, @CloudExpo, @ThingsExpo

@DXWorldExpo: Blog Feed Post

Thinking Like a Data Scientist: Part 2 By @Schmarzo | @BigDataExpo [#BigData]

Uncovering new variables or metrics that would be excellent predictors of business performance

In "Thinking Like a Data Scientist - Part I", we examined the challenges for getting the business users to think like data scientists when contemplating where and how to leverage big data to drive business value. We introduced a "Thinking Like a Data Scientist" process that starts with identifying and understanding the organization's top-level strategic business initiatives, then uses a "Strategic Nouns" technique to create potential business questions that were descriptive, predictive or in nature.

We will now complete this exercise by introducing two additional techniques that we can use to uncover new variables or metrics that would be excellent predictors of business performance.

Thinking Like a Data Scientist Process (Continued)
Step 4: By Analysis.
"By" Analysis is a technique for leveraging a business stakeholder's natural question and query process to uncover:

  • Additional data sources
  • Additional dimensional entity characteristics
  • Additional areas for analytics exploration

"By" Analysis exploratory sentence format looks like the following:

  • "I want to see sales and product margin by product category, store, store remodel date, day of week, store demographics, and customer demographics"
  • "I want to trend hospital admissions by disease category, zip code, patient demographics, hospital size, area demographics and day of week"
  • "I want to compare current versus previous maintenance issues by turbine, turbine manufacturer, date installed, last maintenance date, maintenance person and weather conditions"

Check out my blog titled "Leverage By Analysis To Expand Your Data Science Perspectives" that covered the "By" Analysis in a bit more detail.

Figure 3 shows an example of "By" Analysis for a hypothetical Foot Locker merchandising example from the perspective of the customer. We asked the business users (in a facilitated brainstorming session) to brainstorm the different dimensions and/or attributes of the strategic noun upon which they were focused. You would do this same exercise for each of your strategic nouns.


Figure 3: Foot Locker "By" Analysis Example

The significant number and variety of "By" dimensions and attributes that can surface in a brainstorming session can lead to incredible insight. And remember as you go through this process, all ideas are worthy of consideration; this is not the point to try to filter the creative ideas or handcuff the creative thinking process!

Step 5: Score Technique. The purpose of the "Score" technique is to look for groupings of strategic noun dimensions and attributes that can be combined to create a more predictive and actionable score. These scores are critical components of our "thinking like a data scientist" process by supporting the decisions that we are trying to make, and/or what actions or outcomes are trying to predict with respect to our targeted business initiative.

Scores are very important constructs in the world of data science, and can help to cement the business stakeholders' buy-in to the data science process. The best familiar score example might be the FICO[1] score, which combines a multiple questions and dimensions about a loan applicant's finance history to create a single score that lenders use to predict a borrower's ability to repay a loan (see Figure 4).


Figure 4: FICO Score Example

Scores can be created to provide predictive insights across a number of different industries and across a number of different business initiatives. Figure 5 shows some example scores from different industries.


Figure 5: Sample Scores Across Different Industries

So let's build off of the variables and metrics that were uncovered in the "By" Analysis and see if we can integrate any of those variables or metrics into a higher-level score. In our Foot Locker example, we might want to group the Favorite sports, Favorite teams, High School sports and College sports into a score that measures that individuals "Sports Team Passion." We might discover other potential scores around their level of current "Athletic Activity" (see Figure 6).


Figure 1: Foot Locker Predictive Scores

To be honest, this is probably the most enjoyable part of the process as you brainstorm additional data sources and metrics that can be used as part of your score. Again remember, no idea is a bad idea. Let the data science team decide via their analytic modeling which data sources and metrics are the best predictors of business performance.

Step 6: Close The Loop. The final step in the "Thinking Like A Data Scientist" exercise is "closing the loop" with respect to what analytics-driven scores or recommendations that we need to deliver to our key business stakeholders. You can use a simple "Recommendations Worksheet" that ties the decisions that our business stakeholders need to make (in support of the targeted business initiative) to the predictive and prescriptive analytics that we are going to need to build.

Last is the creation of the user-experience mockup that validates that we are building the right analytics and have a high-level understanding of where and how to deliver those scores and recommendations (e.g., management dashboards and reports, and operational systems such as the call center, procurement, sales, marketing, finance, etc.)

To get examples of these exercises, you're going to have to enroll in my University of San Francisco "Big Data MBA" course. Sorry, got to save some homework for my students!!

Data scientists are critical to advanced analytics, and Ibelieve that you cannot have too many data scientists. But an important challenge is to get your business users to "think like a data scientist" when contemplating data sources and metrics that might be better predictors of business performance. Having a business organization that can "think like a data scientist" will drive better collaboration with your data science team and ultimately, lead to better predictive and prescriptive results, and... value to the business.

Okay Big Data MBA class, get ready for your next assignment!!

Thinking Like a Data Scientist - Part II
Bill Schmarzo

Read the original blog entry...

More Stories By William Schmarzo

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business” and “Big Data MBA: Driving Business Strategies with Data Science”, is responsible for setting strategy and defining the Big Data service offerings for Hitachi Vantara as CTO, IoT and Analytics.

Previously, as a CTO within Dell EMC’s 2,000+ person consulting organization, he works with organizations to identify where and how to start their big data journeys. He’s written white papers, is an avid blogger and is a frequent speaker on the use of Big Data and data science to power an organization’s key business initiatives. He is a University of San Francisco School of Management (SOM) Executive Fellow where he teaches the “Big Data MBA” course. Bill also just completed a research paper on “Determining The Economic Value of Data”. Onalytica recently ranked Bill as #4 Big Data Influencer worldwide.

Bill has over three decades of experience in data warehousing, BI and analytics. Bill authored the Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements. Bill serves on the City of San Jose’s Technology Innovation Board, and on the faculties of The Data Warehouse Institute and Strata.

Previously, Bill was vice president of Analytics at Yahoo where he was responsible for the development of Yahoo’s Advertiser and Website analytics products, including the delivery of “actionable insights” through a holistic user experience. Before that, Bill oversaw the Analytic Applications business unit at Business Objects, including the development, marketing and sales of their industry-defining analytic applications.

Bill holds a Masters Business Administration from University of Iowa and a Bachelor of Science degree in Mathematics, Computer Science and Business Administration from Coe College.

IoT & Smart Cities Stories
DXWorldEXPO LLC announced today that Telecom Reseller has been named "Media Sponsor" of CloudEXPO | DXWorldEXPO 2018 New York, which will take place on November 11-13, 2018 in New York City, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
In his keynote at 19th Cloud Expo, Sheng Liang, co-founder and CEO of Rancher Labs, discussed the technological advances and new business opportunities created by the rapid adoption of containers. With the success of Amazon Web Services (AWS) and various open source technologies used to build private clouds, cloud computing has become an essential component of IT strategy. However, users continue to face challenges in implementing clouds, as older technologies evolve and newer ones like Docker c...
The best way to leverage your Cloud Expo presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering Cloud Expo and @ThingsExpo will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at Cloud Expo. Product announcements during our show provide your company with the most reach through our targeted audiences.
To Really Work for Enterprises, MultiCloud Adoption Requires Far Better and Inclusive Cloud Monitoring and Cost Management … But How? Overwhelmingly, even as enterprises have adopted cloud computing and are expanding to multi-cloud computing, IT leaders remain concerned about how to monitor, manage and control costs across hybrid and multi-cloud deployments. It’s clear that traditional IT monitoring and management approaches, designed after all for on-premises data centers, are falling short in ...
The deluge of IoT sensor data collected from connected devices and the powerful AI required to make that data actionable are giving rise to a hybrid ecosystem in which cloud, on-prem and edge processes become interweaved. Attendees will learn how emerging composable infrastructure solutions deliver the adaptive architecture needed to manage this new data reality. Machine learning algorithms can better anticipate data storms and automate resources to support surges, including fully scalable GPU-c...
A valuable conference experience generates new contacts, sales leads, potential strategic partners and potential investors; helps gather competitive intelligence and even provides inspiration for new products and services. Conference Guru works with conference organizers to pass great deals to great conferences, helping you discover new conferences and increase your return on investment.
Poor data quality and analytics drive down business value. In fact, Gartner estimated that the average financial impact of poor data quality on organizations is $9.7 million per year. But bad data is much more than a cost center. By eroding trust in information, analytics and the business decisions based on these, it is a serious impediment to digital transformation.
SYS-CON Events announced today that Silicon India has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Published in Silicon Valley, Silicon India magazine is the premiere platform for CIOs to discuss their innovative enterprise solutions and allows IT vendors to learn about new solutions that can help grow their business.
We are seeing a major migration of enterprises applications to the cloud. As cloud and business use of real time applications accelerate, legacy networks are no longer able to architecturally support cloud adoption and deliver the performance and security required by highly distributed enterprises. These outdated solutions have become more costly and complicated to implement, install, manage, and maintain.SD-WAN offers unlimited capabilities for accessing the benefits of the cloud and Internet. ...
Founded in 2000, Chetu Inc. is a global provider of customized software development solutions and IT staff augmentation services for software technology providers. By providing clients with unparalleled niche technology expertise and industry experience, Chetu has become the premiere long-term, back-end software development partner for start-ups, SMBs, and Fortune 500 companies. Chetu is headquartered in Plantation, Florida, with thirteen offices throughout the U.S. and abroad.