|By Andreas Grabner||
|December 3, 2012 07:30 AM EST||
Swarovski - the leading producer of cut crystal in the world - relies on its eCommerce store as much like other companies in the highly competitive eCommerce environment. Swarovski's story is no different from others in this space: They started with "Let's build a website to sell our products online" a couple of years ago and quickly progressed to "We sell to 60 million annual visitors across 23 countries in six languages." There were bumps along the road and they realized that it takes more than just a bunch of servers and tools to keep the site running.
Why APM and why you don't just need a tool
Swarovski relies on Intershop's eCommerce platform and faced several challenges as they rapidly grew. Their challenges required them to apply Application Performance Management (APM) practices to ensure they could fulfill the business requirements to keep pace with customer growth while maintaining an excellent user experience. The most insightful comment I heard was from René Neubacher, Senior eBusiness Technology Consultant at Swarovski: "APM is not just about software. APM is a culture, a mindset and a set of business processes. APM software supports that."
René recently discussed their Journey to APM, what their initial problems were and what requirements they ended up having on APM and the tools they needed to support their APM strategy. By now they reached the next level of maturity by establishing a Performance Center of Excellence. This allows them to tackle application performance proactively throughout the organization instead of putting out fires reactively in production.
This article describes the challenges they faced, the questions that arose and the new generation APM requirements that paved the way forward in their performance journey:
Swarvoski had traditional system monitoring in place on all their systems across their delivery chain including web servers, application servers, SAP, database servers, external systems and the network. Knowing that each individual component is up and running 99.99% of the time is great but no longer sufficient. How might these individual component outages impact the user experience of their online shoppers? Who is actually responsible for the end user experience and how should you monitor the complete delivery chain and not just the individual components? These and other questions came up when the eCommerce site attracted more customers which was quickly followed by more complaints about their user experience:
APM includes getting a holistic view of the complete delivery chain and requires someone to be responsible for end user experience.
Questions that had no answers
In addition to "Who is responsible in case users complain?" the other questions that needed to be urgently addressed included:
- How often is the service desk called before IT knows that there is a problem?
- How much time is spent in searching for system errors versus building new features?
- Do we have a process to find the root-cause when a customer reports a problem?
- How do we visualize our services from the customer‘s point of view?
- How much revenue, brand image and productivity are at risk or lost while IT is searching for the problem?
- What to do when someone says "it‘s slow"?
The Ten Requirements
These unanswered questions triggered the need to move away from traditional system monitoring and develop the requirements for new generation APM and user experience management.
1: Support State-of-the-Art Architecture
Based on their current system architecture it was clear that Swarovski needed an approach that was able to work in their architecture, now and in the future. The rise of more interactive Web 2.0 and mobile applications had to be factored in to allow monitoring end users from many different devices and regardless of whether they used a web application or mobile native application as their access point.
Transactions need to be followed from the browser all the way back to the database. It is important to support distributed transactions. This approach also helps to spot architectural and deployment problems immediately
2: 100% transactions and clicks - No Averages
Based on their experience, Swarovski knew that looking at average values or sampled data would not be helpful when customers complained about bad performance. Responding to a customer complaint with "Our average user has no problem right now - sorry for your inconvenience" is not what you want your helpdesk engineers to use as a standard phrase. Averages or sampling also hides the real problems you have in your system. Check out the blog post Why Averages Suck by Michael Kopp for more detail.
Measuring end user performance of every customer interaction allows for quick identification of regional problems with CDNs, Third Parties or Latency.
Having 100% user interactions and transactions available makes it easy to identify the root cause for individual users
3: Business Visibility
As the business had a growing interest in the success of the eCommerce platform, IT had to demonstrate to the business what it took to fulfill their requirements and how business requirements are impacted by the availability or the lack of investment in the application delivery chain.
Correlating the number of Visits with Performance on incoming Orders illustrates the measurable impact of performance on revenue and what it takes to support business requirements.
4: Impact of 3rd Parties and CDNs
It was important to not only track transactions involving their own Data Center but all user interactions with their web site - even those delivered through CDNs or third parties. All of these interactions make up the user experience and therefore all of it needs to be analyzed.
Seeing the actual load impact of third-party components or content delivered from CDNs enables IT to pinpoint user experience problems that originate outside their own data center.
5: Across the life cycle - supporting collaboration and tearing down silos
The APM initiative was started because Swarovski reacted to problems happening in production. Fixing these problems in production is only the first step. Their ultimate goal is to become pro-active by finding and fixing problems in development or testing-before they spill over into production. Instead of relying on different sets of tools with different capabilities, the requirement is to use one single solution that is designed to be used across the application lifecycle (Developer Workstation, Continuous Integration, Testing, Staging and Production). It will make it easier to share application performance data between lifecycle stages allowing individuals to not only easily look at data from other stages but also compare data to verify impact and behavior of code changes between version updates.
Continuously catching regressions in Development by analyzing unit and performance tests allows application teams to become more proactive.
Pinpointing integration and scalability issues, continuously, in acceptance and load testing makes testing more efficient and prevents problems from reaching production.
6: Down to the source code
In order to speed up problem resolution Swarovski's operations and development teams require as much code-level insight as possible - not only for their own engineers who are extending the Intershop eCommerce Platform but also for Intershop to improve their product. Knowing what part of the application code is not performing well with which input parameters or under which specific load on the system eliminates tedious reproduction of the problem. The requirement is to lower the Mean Time To Repair (MTTR) from as much as several days down to only a couple of hours.
The SAP Connector turned out to have a performance problem. This method-level detailed information was captured without changing any code.
7: Zero/Acceptable overhead
"Who are we kidding? There is nothing like zero overhead especially when you need 100% coverage!" - Just the words from René when you explained that requirement. And he is right: once you start collecting information from a production system you add a certain amount of overhead. A better term for this would be "imperceptible overhead" - overhead that's so small, you don't notice it.
What is the exact number? It depends on your business and your users. The number should be worked out from the impact on the end user experience, rather than additional CPU, memory or network bandwidth required in the data center. Swarovski knew they had to achieve less than 2% overhead on page load times in production, as anything more would have hurt their business; and that's what they achieved.
8: Centralized data collection and administration
Running a distributed eCommerce application that gets potentially extended to additional geographical locations requires an APM system with a centralized data collection and administration option. It is not feasible to collect different types of performance information from different systems, servers or even data centers. It would either require multiple different analysis tools or data transformation to a single format to use it for proper analysis.
Instead of this approach, a single unified APM system was required by Swarovski. Central administration is equally important as they need to eliminate the need to rely on remote IT administrators to make changes to the monitored system, for example, simple tasks such as changing the level of captured data or upgrading to a new version.
By storing and accessing performance data from a single, centralized repository, enables fast and powerful analytic and visualization. For example, system metrics such as CPU utilization can be correlated with end-user response time or database execution time - all displayed on one single dashboard.
9: Auto-Adapting Instrumentation without digging through code
As the majority of the application code is not developed in-house but provided by Intershop, it is mandatory to get insight into the application without doing any manual code changes. The APM system must auto-adapt to changes so that no manual configuration change is necessary when a new version of the application is deployed.
This means Swarovski can focus on making their applications positively contribute to business outcomes; rather than spend time maintaining IT systems.
10: Ability to extend
Their application is an always growing an ever-changing IT environment. Where everything might have been deployed on physical boxes it might be moved to virtualized environments or even into a public cloud environment.
Whatever the extension may be - the APM solution must be able to adapt to these changes and also be extensible to consume new types of data sources, e.g., performance metrics from Amazon Cloud Services or VMware, Cassandra or other Big Data Solutions or even extend to legacy Mainframe applications and then bring these metrics into the centralized data repository and provide new insights into the application's performance.
Extending the application monitoring capabilities to Amazon EC2, Microsoft Windows Azure, a public or private cloud enables the analysis of the performance impact of these virtualized environments on end user experience.
The Solution and the Way Forward
Needless to say that Swarovski took the first step in implementing APM as a new process and mindset in their organization. They are now in the next phase of implementing a Performance Center of Excellence. This allows them moving from Reactive Performance Troubleshooting to Proactive Performance Prevention.
Stay tuned for more blog posts on the Performance Center of Excellence and how you can build one in your own organization. The key message is that it is not about just using a bunch of tools. It is about living and breathing performance throughout the organization. If you are interested in this check out the blogs by Steve Wilson: Proactive vs Reactive: How to prevent problems instead of fixing them faster and Performance in Development is the Chief Cornerstone.
You think you know what’s in your data. But do you? Most organizations are now aware of the business intelligence represented by their data. Data science stands to take this to a level you never thought of – literally. The techniques of data science, when used with the capabilities of Big Data technologies, can make connections you had not yet imagined, helping you discover new insights and ask new questions of your data. In his session at @ThingsExpo, Sarbjit Sarkaria, data science team lead ...
Jul. 25, 2016 03:45 PM EDT Reads: 936
Extracting business value from Internet of Things (IoT) data doesn’t happen overnight. There are several requirements that must be satisfied, including IoT device enablement, data analysis, real-time detection of complex events and automated orchestration of actions. Unfortunately, too many companies fall short in achieving their business goals by implementing incomplete solutions or not focusing on tangible use cases. In his general session at @ThingsExpo, Dave McCarthy, Director of Products...
Jul. 25, 2016 03:30 PM EDT Reads: 1,681
"delaPlex is a software development company. We do team-based outsourcing development," explained Mark Rivers, COO and Co-founder of delaPlex Software, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
Jul. 25, 2016 03:00 PM EDT Reads: 1,968
WebRTC is bringing significant change to the communications landscape that will bridge the worlds of web and telephony, making the Internet the new standard for communications. Cloud9 took the road less traveled and used WebRTC to create a downloadable enterprise-grade communications platform that is changing the communication dynamic in the financial sector. In his session at @ThingsExpo, Leo Papadopoulos, CTO of Cloud9, discussed the importance of WebRTC and how it enables companies to focus...
Jul. 25, 2016 02:45 PM EDT Reads: 849
Is your aging software platform suffering from technical debt while the market changes and demands new solutions at a faster clip? It’s a bold move, but you might consider walking away from your core platform and starting fresh. ReadyTalk did exactly that. In his General Session at 19th Cloud Expo, Michael Chambliss, Head of Engineering at ReadyTalk, will discuss why and how ReadyTalk diverted from healthy revenue and over a decade of audio conferencing product development to start an innovati...
Jul. 25, 2016 02:00 PM EDT Reads: 942
Early adopters of IoT viewed it mainly as a different term for machine-to-machine connectivity or M2M. This is understandable since a prerequisite for any IoT solution is the ability to collect and aggregate device data, which is most often presented in a dashboard. The problem is that viewing data in a dashboard requires a human to interpret the results and take manual action, which doesn’t scale to the needs of IoT.
Jul. 25, 2016 01:00 PM EDT Reads: 1,927
SYS-CON Events announced today that 910Telecom will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Housed in the classic Denver Gas & Electric Building, 910 15th St., 910Telecom is a carrier-neutral telecom hotel located in the heart of Denver. Adjacent to CenturyLink, AT&T, and Denver Main, 910Telecom offers connectivity to all major carriers, Internet service providers, Internet backbones and ...
Jul. 25, 2016 12:15 PM EDT Reads: 438
CenturyLink has announced that application server solutions from GENBAND are now available as part of CenturyLink’s Networx contracts. The General Services Administration (GSA)’s Networx program includes the largest telecommunications contract vehicles ever awarded by the federal government. CenturyLink recently secured an extension through spring 2020 of its offerings available to federal government agencies via GSA’s Networx Universal and Enterprise contracts. GENBAND’s EXPERiUS™ Application...
Jul. 25, 2016 12:00 PM EDT Reads: 1,817
IoT generates lots of temporal data. But how do you unlock its value? You need to discover patterns that are repeatable in vast quantities of data, understand their meaning, and implement scalable monitoring across multiple data streams in order to monetize the discoveries and insights. Motif discovery and deep learning platforms are emerging to visualize sensor data, to search for patterns and to build application that can monitor real time streams efficiently. In his session at @ThingsExpo, ...
Jul. 25, 2016 11:00 AM EDT Reads: 904
Verizon Communications Inc. (NYSE, Nasdaq: VZ) and Yahoo! Inc. (Nasdaq: YHOO) have entered into a definitive agreement under which Verizon will acquire Yahoo's operating business for approximately $4.83 billion in cash, subject to customary closing adjustments. Yahoo informs, connects and entertains a global audience of more than 1 billion monthly active users** -- including 600 million monthly active mobile users*** through its search, communications and digital content products. Yahoo also co...
Jul. 25, 2016 10:30 AM EDT Reads: 329
"There's a growing demand from users for things to be faster. When you think about all the transactions or interactions users will have with your product and everything that is between those transactions and interactions - what drives us at Catchpoint Systems is the idea to measure that and to analyze it," explained Leo Vasiliou, Director of Web Performance Engineering at Catchpoint Systems, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York Ci...
Jul. 25, 2016 10:30 AM EDT Reads: 1,942
"Tintri was started in 2008 with the express purpose of building a storage appliance that is ideal for virtualized environments. We support a lot of different hypervisor platforms from VMware to OpenStack to Hyper-V," explained Dan Florea, Director of Product Management at Tintri, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
Jul. 25, 2016 10:15 AM EDT Reads: 1,868
The best-practices for building IoT applications with Go Code that attendees can use to build their own IoT applications. In his session at @ThingsExpo, Indraneel Mitra, Senior Solutions Architect & Technology Evangelist at Cognizant, provided valuable information and resources for both novice and experienced developers on how to get started with IoT and Golang in a day. He also provided information on how to use Intel Arduino Kit, Go Robotics API and AWS IoT stack to build an application tha...
Jul. 25, 2016 10:00 AM EDT Reads: 996
SYS-CON Events announced today that LeaseWeb USA, a cloud Infrastructure-as-a-Service (IaaS) provider, will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. LeaseWeb is one of the world's largest hosting brands. The company helps customers define, develop and deploy IT infrastructure tailored to their exact business needs, by combining various kinds cloud solutions.
Jul. 25, 2016 09:45 AM EDT Reads: 1,133
Whether your IoT service is connecting cars, homes, appliances, wearable, cameras or other devices, one question hangs in the balance – how do you actually make money from this service? The ability to turn your IoT service into profit requires the ability to create a monetization strategy that is flexible, scalable and working for you in real-time. It must be a transparent, smoothly implemented strategy that all stakeholders – from customers to the board – will be able to understand and comprehe...
Jul. 25, 2016 09:30 AM EDT Reads: 2,106
The cloud market growth today is largely in public clouds. While there is a lot of spend in IT departments in virtualization, these aren’t yet translating into a true “cloud” experience within the enterprise. What is stopping the growth of the “private cloud” market? In his general session at 18th Cloud Expo, Nara Rajagopalan, CEO of Accelerite, explored the challenges in deploying, managing, and getting adoption for a private cloud within an enterprise. What are the key differences between wh...
Jul. 25, 2016 09:15 AM EDT Reads: 2,012
SYS-CON Events announced today that Venafi, the Immune System for the Internet™ and the leading provider of Next Generation Trust Protection, will exhibit at @DevOpsSummit at 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Venafi is the Immune System for the Internet™ that protects the foundation of all cybersecurity – cryptographic keys and digital certificates – so they can’t be misused by bad guys in attacks...
Jul. 25, 2016 08:30 AM EDT Reads: 1,280
Large scale deployments present unique planning challenges, system commissioning hurdles between IT and OT and demand careful system hand-off orchestration. In his session at @ThingsExpo, Jeff Smith, Senior Director and a founding member of Incenergy, will discuss some of the key tactics to ensure delivery success based on his experience of the last two years deploying Industrial IoT systems across four continents.
Jul. 25, 2016 06:00 AM EDT Reads: 1,505
There will be new vendors providing applications, middleware, and connected devices to support the thriving IoT ecosystem. This essentially means that electronic device manufacturers will also be in the software business. Many will be new to building embedded software or robust software. This creates an increased importance on software quality, particularly within the Industrial Internet of Things where business-critical applications are becoming dependent on products controlled by software. Qua...
Jul. 25, 2016 05:45 AM EDT Reads: 1,382
SYS-CON Events has announced today that Roger Strukhoff has been named conference chair of Cloud Expo and @ThingsExpo 2016 Silicon Valley. The 19th Cloud Expo and 6th @ThingsExpo will take place on November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. "The Internet of Things brings trillions of dollars of opportunity to developers and enterprise IT, no matter how you measure it," stated Roger Strukhoff. "More importantly, it leverages the power of devices and the Interne...
Jul. 25, 2016 05:00 AM EDT Reads: 2,021