Click here to close now.

Welcome!

Java Authors: XebiaLabs Blog, Irit Gillath, David Sprott, Elizabeth White, Pat Romanski

Related Topics: Java, SOA & WOA, Virtualization, AJAX & REA

Java: Article

Why Response Times Are Often Measured Incorrectly

Response time measurements and how to interpret them

Response times are in many – if not in most – cases the basis for performance analysis. When they are within expected boundaries everything is ok. When they get to high we start optimizing our applications.

So response times play a central role in performance monitoring and analysis. In virtualized and cloud environments they are the most accurate performance metric you can get. Very often, however, people measure and interpret response times the wrong way. This is more than reason enough to discuss the topic of response time measurements and how to interpret them. Therefore I will discuss typical measurement approaches, the related misunderstandings and how to improve measurement approaches.

Averaging information away
When measuring response times, we cannot look at each and every single measurement. Even in very small production systems the number of transactions is unmanageable. Therefore measurements are aggregated for a certain timeframe. Depending on the monitoring configuration this might be seconds, minutes or even hours.

While this aggregation helps us to easily understand response times in large volume systems, it also means that we are losing information. The most common approach to measurement aggregation is using averages. This means the collected measurements are averaged and we are working with the average instead of the real values.

The problem with averages is that they in many cases do not reflect what is happening in the real world. There are two main reasons why working with averages leads to wrong or misleading results.

In the case of measurements that are highly volatile in their value, the average is not representative for actually measured response times. If our measurements range from 1 to 4 seconds the average might be around 2 seconds which certainly does not represent what many of our users perceive.

So averages only provide little insight into real world performance. Instead of working with averages you should use percentiles. If you talk to people who have been working in the performance space for some time, they will tell you that the only reliable metrics to work with are percentiles. In contrast to averages, percentiles define how many users perceived response times slower than a certain threshold. If the 50th percentile for example is 2.5 seconds this means that the response times for 50 percent of your users were less or equal to 2.5 seconds. As you can see this approach is by far closer to reality than using averages

Percentiles and Average of a Measurement Series

Percentiles and Average of a Measurement Series

The only potential downside with percentiles is that they require more data to be stored than averages do. While average calculation only requires the sum and count of all measurements, percentiles require a whole range of measurement values as their calculation is more complex. This is also the reason why not all performance management tools support them.

Putting all in a box
Another important question when aggregating data is which data you use as the basis of your aggregations. If you mix together data for different transaction types like the start page, a search and a credit card validation the results will only be of little value as the base data is kind of apple and oranges. So in addition to ensuring that you are working with percentiles it is necessary to also split transaction types properly so that the data that is the basis for your calculations fits together

The concept of splitting transactions by their business function is often referred to as business transaction management. While the field of BTM is wide, the basic idea is to distinguish transactions in an application by logical parameters like what they do or where they come from. An example would be a “put into cart” transaction or the requests of a certain user.

Only a combination of both approaches ensures that the response times you measure are a solid basis for performance analysis.

Far from the real world
Another point to consider with response times is where they are measured. Most people measure response times at the server-side and implicitly assume that they represent what real users see. While server-side response times are down to 500 milliseconds and everyone thinks everything is fine, users might experience response times of several seconds.

The reason is that server-side response times don’t take a lot of factors influencing end-user response times into account. First of all server-side measurements neglect network transfer time to the end users. This easily adds half a second or more to your response times.

Server vs. Client Response Time

Server vs. Client Response Time

At the same time server-side response times often only measure the initial document sent to the user. All images, JavaScript and CSS files that are required to render a paper properly are not included in this calculation at all. Experts like Steve Souders even say that only 10 percent of the overall response time is influenced by the server side. Even if we consider this an extreme scenario it is obvious that basing performance management solely on server-side metrics does not provide a solid basis for understanding end-user performance.

The situation gets even worse with JavaScript-heavy Web 2.0 applications where a great portion of the application logic is executed within the browser. In this case server-side metrics cannot be taken as representative for end-user performance at all.

Not measuring what you want to know
A common approach to solve this problem is to use synthetic transaction monitoring. This approach often claims to be “close to the end-user”. Commercial providers offer a huge number of locations around the world from where you can test the performance of pre-defined transactions. While this provides better insight into what the perceived performance of end-users is, it is not the full truth.

The most important thing to understand is how these measurements are collected. There are two approaches to collect this data: via emulators or real browsers. From my very personal perspective any approach that does not use real browsers should be avoided as real browsers are also what your users use. They are the only way to get accurate measurements.

The issue with using synthetic transactions for performance measurement is that it is not about real users. Your synthetic transactions might run pretty fast, but that guy with a slow internet connection who just wants to book a $5,000 holiday (ok, a rare case) still sees 10 second response times. Is it the fault of your application? No. Do you care? Yes, because this is your business. Additionally synthetic transaction monitoring cannot monitor all of your transactions. You cannot really book a holiday every couple of minutes, so you at the end only get a portion of your transactions covered by your monitoring.

This does not mean that there is no value in using synthetic transactions. They are great to be informed about availability or network problems that might affect your users, but they do not represent what your users actually see. As a consequence, they do not serve as a solid basis for performance improvements

Measuring at the End-User Level
The only way to get real user performance metrics is to measure from within the users’ browser. There are two approaches to do this. You can user a tool like the free dynaTrace Ajax Edition which uses a browser plug-in to collect performance data or inject JavaScript code to get performance metrics. The W3C now also has a number of standardization activities for browser performance APIs. The Navigation Timing Specification is already supported by recent browsers and the Resource Timing Specification. Open-source implementations like Boomerang provide a convenient way to access performance data within the browser. Products like dynaTrace UEM go further by providing a highly scalable backend and full integration into your server-side systems.

The main idea is to inject custom JavaScript code which captures timing information like the beginning of a request, DOM ready and fully loaded. While these events are sufficient for “classic” web applications they are not enough for Web 2.0 applications which execute a lot of client-side code. In this case the JavaScript code has to be instrumented as well.

Is it enough to measure on the client-side?
The question now is whether it is enough to measure performance from the end-user perspective. If we know how our web application performs for each user we have enough information to see whether an application is slow or fast. If we then combine this data with information like geo location, browser and connection speed we know for which users a problem exists. So from a pure monitoring perspective this is enough.

In case of problems, however, we want to go beyond monitoring. Monitoring only tells us that we have a problem but does not help in finding the cause of the problem. Especially when we measure end-user performance our information is less rich compared to development-centric approaches. We could still use a development-focused tool like dynaTrace Ajax Edition for production troubleshooting. This however requires installing custom software on an end user’s machine. While this might be an option for SaaS environments this is not the case in a typical eCommerce scenario.

The only way to gain this level of insight for diagnostics purposes is to collect information from the browser as well as the server side to have a holistic view on application performance. As discussed using averaged metrics is not enough in this case. Using aggregated data does not provide the insight we need. So instead of aggregated information we require the possibility to identify and relate the requests of a user’s browser to server-side requests.

Client/Server Drill Down of Pages and Actions

Client/Server Drill Down of Pages and Actions

The figure below shows an architecture based (and abstracted) from dynaTrace UEM which provides this functionality. It shows the combination of browser and server-side data capturing on a transactional basis and a centralized performance repository for analysis.

 

Architecture for End-To-End User Experience Monitoring

Architecture for End-To-End User Experience Monitoring

Conclusion
There are many ways where and how to measure response times. Depending on what we want to achieve each one of them provides more or less accurate data. For the analysis of server-side problems measuring at the server-side is enough. We however have to be aware that this does not reflect the response times of our end users. It is a purely technical metric for optimizing the way we create content and service requests. The prerequisite to meaningful measurements is that we separate different transaction types properly.

Measurements from anything but the end-user’s perspective can only be used to optimize your technical infrastructure and only indirectly the performance of end users. Only performance measurements in the browser enable you to understand and optimize user-perceived performance.

Related reading:

  1. Antivirus Add-On for IE to cause 5 times slower page load times The dynaTrace AJAX Community has been really active lately –...
  2. Troubleshooting response time problems – why you cannot trust your system metrics // Production Monitoring is about ensuring the stability and health...
  3. Why you can’t compare cross browser execution times of Selenium Tests // I am currently working on a blog where I...
  4. Application Performance Monitoring in production – A Step-by-Step Guide – Part 1 // Setting up Application Performance Monitoring is a big task,...
  5. Week 9 – How to Measure Application Performance Measurement is the most central concept in any performance-related activity....

More Stories By Alois Reitbauer

Alois Reitbauer works as a Technology Strategist for dynaTrace Software where he is leading the Methods and Technology team. As part of the R&D team he influences the dynaTrace product strategy and works closely with key customers in implementing performance management solution for the entire lifecylce. Alois has 10 years experience as architect and developer in the Java and .NET space. He is a frequent speaker at technology conferences on performance and architecture related topics and regularly publishes articles blogs on blog.dynatrace.com

@ThingsExpo Stories
The cloud is now a fact of life but generating recurring revenues that are driven by solutions and services on a consumption model have been hard to implement, until now. In their session at 16th Cloud Expo, Ermanno Bonifazi, CEO & Founder of Solgenia, and Ian Khan, Global Strategic Positioning & Brand Manager at Solgenia, will discuss how a top European telco has leveraged the innovative recurring revenue generating capability of the consumption cloud to enable a unique cloud monetization model to drive results.
As organizations shift toward IT-as-a-service models, the need for managing and protecting data residing across physical, virtual, and now cloud environments grows with it. CommVault can ensure protection &E-Discovery of your data – whether in a private cloud, a Service Provider delivered public cloud, or a hybrid cloud environment – across the heterogeneous enterprise. In his session at 16th Cloud Expo, Randy De Meno, Chief Technologist - Windows Products and Microsoft Partnerships, will discuss how to cut costs, scale easily, and unleash insight with CommVault Simpana software, the only si...
Docker is an excellent platform for organizations interested in running microservices. It offers portability and consistency between development and production environments, quick provisioning times, and a simple way to isolate services. In his session at DevOps Summit at 16th Cloud Expo, Shannon Williams, co-founder of Rancher Labs, will walk through these and other benefits of using Docker to run microservices, and provide an overview of RancherOS, a minimalist distribution of Linux designed expressly to run Docker. He will also discuss Rancher, an orchestration and service discovery platf...
Analytics is the foundation of smart data and now, with the ability to run Hadoop directly on smart storage systems like Cloudian HyperStore, enterprises will gain huge business advantages in terms of scalability, efficiency and cost savings as they move closer to realizing the potential of the Internet of Things. In his session at 16th Cloud Expo, Paul Turner, technology evangelist and CMO at Cloudian, Inc., will discuss the revolutionary notion that the storage world is transitioning from mere Big Data to smart data. He will argue that today’s hybrid cloud storage solutions, with commodity...
Cloud data governance was previously an avoided function when cloud deployments were relatively small. With the rapid adoption in public cloud – both rogue and sanctioned, it’s not uncommon to find regulated data dumped into public cloud and unprotected. This is why enterprises and cloud providers alike need to embrace a cloud data governance function and map policies, processes and technology controls accordingly. In her session at 15th Cloud Expo, Evelyn de Souza, Data Privacy and Compliance Strategy Leader at Cisco Systems, will focus on how to set up a cloud data governance program and s...
Roberto Medrano, Executive Vice President at SOA Software, had reached 30,000 page views on his home page - http://RobertoMedrano.SYS-CON.com/ - on the SYS-CON family of online magazines, which includes Cloud Computing Journal, Internet of Things Journal, Big Data Journal, and SOA World Magazine. He is a recognized executive in the information technology fields of SOA, internet security, governance, and compliance. He has extensive experience with both start-ups and large companies, having been involved at the beginning of four IT industries: EDA, Open Systems, Computer Security and now SOA.
The industrial software market has treated data with the mentality of “collect everything now, worry about how to use it later.” We now find ourselves buried in data, with the pervasive connectivity of the (Industrial) Internet of Things only piling on more numbers. There’s too much data and not enough information. In his session at @ThingsExpo, Bob Gates, Global Marketing Director, GE’s Intelligent Platforms business, to discuss how realizing the power of IoT, software developers are now focused on understanding how industrial data can create intelligence for industrial operations. Imagine ...
Every innovation or invention was originally a daydream. You like to imagine a “what-if” scenario. And with all the attention being paid to the so-called Internet of Things (IoT) you don’t have to stretch the imagination too much to see how this may impact commercial and homeowners insurance. We’re beyond the point of accepting this as a leap of faith. The groundwork is laid. Now it’s just a matter of time. We can thank the inventors of smart thermostats for developing a practical business application that everyone can relate to. Gone are the salad days of smart home apps, the early chalkb...
We certainly live in interesting technological times. And no more interesting than the current competing IoT standards for connectivity. Various standards bodies, approaches, and ecosystems are vying for mindshare and positioning for a competitive edge. It is clear that when the dust settles, we will have new protocols, evolved protocols, that will change the way we interact with devices and infrastructure. We will also have evolved web protocols, like HTTP/2, that will be changing the very core of our infrastructures. At the same time, we have old approaches made new again like micro-services...
Operational Hadoop and the Lambda Architecture for Streaming Data Apache Hadoop is emerging as a distributed platform for handling large and fast incoming streams of data. Predictive maintenance, supply chain optimization, and Internet-of-Things analysis are examples where Hadoop provides the scalable storage, processing, and analytics platform to gain meaningful insights from granular data that is typically only valuable from a large-scale, aggregate view. One architecture useful for capturing and analyzing streaming data is the Lambda Architecture, representing a model of how to analyze rea...
Today’s enterprise is being driven by disruptive competitive and human capital requirements to provide enterprise application access through not only desktops, but also mobile devices. To retrofit existing programs across all these devices using traditional programming methods is very costly and time consuming – often prohibitively so. In his session at @ThingsExpo, Jesse Shiah, CEO, President, and Co-Founder of AgilePoint Inc., discussed how you can create applications that run on all mobile devices as well as laptops and desktops using a visual drag-and-drop application – and eForms-buildi...
SYS-CON Events announced today that Vitria Technology, Inc. will exhibit at SYS-CON’s @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Vitria will showcase the company’s new IoT Analytics Platform through live demonstrations at booth #330. Vitria’s IoT Analytics Platform, fully integrated and powered by an operational intelligence engine, enables customers to rapidly build and operationalize advanced analytics to deliver timely business outcomes for use cases across the industrial, enterprise, and consumer segments.
Containers and microservices have become topics of intense interest throughout the cloud developer and enterprise IT communities. Accordingly, attendees at the upcoming 16th Cloud Expo at the Javits Center in New York June 9-11 will find fresh new content in a new track called PaaS | Containers & Microservices Containers are not being considered for the first time by the cloud community, but a current era of re-consideration has pushed them to the top of the cloud agenda. With the launch of Docker's initial release in March of 2013, interest was revved up several notches. Then late last...
SYS-CON Events announced today that Dyn, the worldwide leader in Internet Performance, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Dyn is a cloud-based Internet Performance company. Dyn helps companies monitor, control, and optimize online infrastructure for an exceptional end-user experience. Through a world-class network and unrivaled, objective intelligence into Internet conditions, Dyn ensures traffic gets delivered faster, safer, and more reliably than ever.
CommVault has announced that top industry technology visionaries have joined its leadership team. The addition of leaders from companies such as Oracle, SAP, Microsoft, Cisco, PwC and EMC signals the continuation of CommVault Next, the company's business transformation for sales, go-to-market strategies, pricing and packaging and technology innovation. The company also announced that it had realigned its structure to create business units to more directly match how customers evaluate, deploy, operate, and purchase technology.
In their session at @ThingsExpo, Shyam Varan Nath, Principal Architect at GE, and Ibrahim Gokcen, who leads GE's advanced IoT analytics, focused on the Internet of Things / Industrial Internet and how to make it operational for business end-users. Learn about the challenges posed by machine and sensor data and how to marry it with enterprise data. They also discussed the tips and tricks to provide the Industrial Internet as an end-user consumable service using Big Data Analytics and Industrial Cloud.
Performance is the intersection of power, agility, control, and choice. If you value performance, and more specifically consistent performance, you need to look beyond simple virtualized compute. Many factors need to be considered to create a truly performant environment. In his General Session at 15th Cloud Expo, Harold Hannon, Sr. Software Architect at SoftLayer, discussed how to take advantage of a multitude of compute options and platform features to make cloud the cornerstone of your online presence.
The explosion of connected devices / sensors is creating an ever-expanding set of new and valuable data. In parallel the emerging capability of Big Data technologies to store, access, analyze, and react to this data is producing changes in business models under the umbrella of the Internet of Things (IoT). In particular within the Insurance industry, IoT appears positioned to enable deep changes by altering relationships between insurers, distributors, and the insured. In his session at @ThingsExpo, Michael Sick, a Senior Manager and Big Data Architect within Ernst and Young's Financial Servi...
Even as cloud and managed services grow increasingly central to business strategy and performance, challenges remain. The biggest sticking point for companies seeking to capitalize on the cloud is data security. Keeping data safe is an issue in any computing environment, and it has been a focus since the earliest days of the cloud revolution. Understandably so: a lot can go wrong when you allow valuable information to live outside the firewall. Recent revelations about government snooping, along with a steady stream of well-publicized data breaches, only add to the uncertainty
The explosion of connected devices / sensors is creating an ever-expanding set of new and valuable data. In parallel the emerging capability of Big Data technologies to store, access, analyze, and react to this data is producing changes in business models under the umbrella of the Internet of Things (IoT). In particular within the Insurance industry, IoT appears positioned to enable deep changes by altering relationships between insurers, distributors, and the insured. In his session at @ThingsExpo, Michael Sick, a Senior Manager and Big Data Architect within Ernst and Young's Financial Servi...