|By Andreas Grabner||
|April 11, 2013 03:25 PM EDT||
We have been blogging about the same problems and problem patterns we see while working with our customers over the past few of years. There have always been the classic application performance landmines in the areas of inefficient database access, misconfigured frameworks, excessive memory usage, bloated web pages and not following common web performance best practices among others.
More than two years ago we posted summary blogs of the Top Server-Side Performance Problems and the Top 10 Client-Side Performance Problems to give operations, architects, testers and developers easy-to-consume best practices. We feel that it is time to provide an update to these best practices as new problem patterns have since come into play. We also want to cover more than just problems that happen within your application by broadening the scope across the entire Application Delivery Chain. This includes all components between your end user and your back-end systems, databases and third-party services. The following illustrates which components are involved and what the typical errors are along the delivery chain.
Delivering an application to the end user has become more complex as it involves more components than ever before. This also leaves a lot of room for mistakes that impact end-user experience.
Let's now dig a little deeper in some of the highlighted problem areas. The following lists our Top Performance Landmines that have been reported by our customers such as BonTon and Swarovski. Other companies include those in the financial services industry, manufacturing industry and energy industry among others. To make it easier for you to decide which landmines to read we also added the target audience for each problem area.
Bloated Web Front Ends
Audience: Operations, Architects, Testers, Developers
Often companies focus on optimizing the performance of the applications they deliver by tuning the code, reducing SQL overhead, implementing application caching, and other items that are, for the most part, invisible to the customer using the application. However, all of this effort and activity can go completely unnoticed if the content being delivered to customers is bloated and inefficient.
Sources we track show that the average page delivered to customers has been steadily increasing in size and complexity over the last 3-4 years as well as customers' expectations of performance. This continuous conflict of business vs customer expectations needs to be understood in order to be effectively managed. What companies need to realize is that what they consider to be fast and efficient doesn't really matter. If the customers using the site believe that the site is slow and hard to use, they won't use it and they will tell their friends about their poor experience.
Comparing your performance to top competitors in your industry as well as Internet leaders helps you set performance goals that can be achieved over time. Additionally, understanding why your customers leave your site can help you resolve customer experience issues: Is it a particular subset of customers who leave? Which page caused them to leave? Is there an application function on that page that is bloated and slow?
Comparing your site against peers in the same industry will help you understand where you rank.
Using caching, compression, CDNs, and a critical eye that asks questions about every new image, function, and feature you add, you can trim the weight of your site and deliver a better customer experience.
We discuss the performance degradation that can be traced to bloated front ends and how this affects site performance in Performance Improvement is not Performance Optimization and Super Bowl Sunday 2013 - Winners, Losers, and Casualties.
Slow Third-Party Content and CDNs
Audience: Operations, Architects, Testers
Focusing on your own content can leave you exposed to performance issues that originate outside your organization. With companies adding more content from third-party sources to their site, managing application performance becomes increasingly complex, even when these services are designed to improve performance.
During peak performance events over the last 12 months - holiday shopping season and the Super Bowl - two primary trends were seen: third-party services were overwhelmed when more than one of their customers reached peak traffic simultaneously and CDNs buckled under flash loads that were far larger than even the busiest days their customers typically experience.
Monitoring and managing third parties means treating them as unique applications, with their own baselines and Service Level Agreements (SLAs) and Service Level Objectives (SLOs). It sometimes means asking tough questions of these services, such as:
- Have you load tested your systems to see what happens when three of your largest customers experience peak traffic simultaneously?
- What is the escalation path we should follow with your team when we discover a performance issue that is affecting our customers?
- How well did your system perform during the eight busiest hours over the last 12 months, not just the average performance?
Monitor the impact of slow third-party and CDN content on your page load time.
Finally, your team needs to be prepared for the scenario where a third-party service or CDN suffers a severe outage or begins to seriously degrade your site performance. Always have a Plan B, C, etc. that gives you the ability to mitigate the issue. These plans could include removing third-party tags, images, and content from your site entirely during peak traffic, load balancing between multiple CDNs, moving content to a secondary cloud provider, all the way to switching to a simple bare bones site that removes all rich media until traffic returns to a normal level.
Unless you know how third parties affect your performance, there is no way for you to manage them effectively. Once you manage your third parties, you can take control of all aspects of your site performance.
More on third-party services and their effects on application performance is covered in: You only control 1/3 of your Page Load Performance!, Third Party Content Management applied: Four steps to gain control of your Page Load Performance!, The Ripple Effect of Facebook's Outage, Third-Party Issues and the Performance Ripple Effect, and Website's Vulnerability to Third-Party Services Exposed.
We also discuss third parties, most notably CDN performance in: Super Bowl Sunday 2013 - Winners, Losers, and Casualties, and Why Bon Ton needs real-time visibility into 85% of its content delivered by Akamai.
Wrong Usage of Frameworks
Audience: Architects, Developers
The following screenshot shows that Hibernate executes the same SQL query multiple times instead of caching the result from the first query. This happens in case Hibernate has not been configured correctly to perform optimally for your specific needs:
Loading a person two times in a row, but no session cache involved
Finally, frameworks get constantly updated to improve functionality but also improve performance and stability. You want to watch out for these updates and also update your implemented framework version to benefit from the improvements. We have seen cases where, e.g., jQuery was never updated leaving websites with bad performance on older browsers and sometimes even on newer browsers when older versions of jQuery didn't leverage the capabilities of the latest IE, FF, Chrome or Safari browsers.
Long-running CSS Class Name Lookups contribute about 80% to the Client-Side Load Time.
If you want to read more about common problems when using these types of frameworks check out our blogs series on Hibernate (The Session Cache, The Query Cache, Second Level Cache), the Top SharePoint Performance Mistakes or the 101 on jQuery Selector Performance.
Network Infrastructure Problems
Audience: Operations, Architects, Testers
Network infrastructure is an important component of every successful business operation. Performance problems experienced by end users can have various origins. The operation teams need Application Performance Monitoring solutions that will enable them to isolate fault domains effortlessly and quickly.
Sometimes the answer is not obvious and performance problems can end up in a "war room" between infrastructure and application providers. The team needs to analyze whether the problem is present at all locations where the application is executed. In certain cases, the performance problems might be caused by external infrastructure used by some users.
Performance problems can be pretty costly. According to the report by the Aberdeen Group they can reduce revenue by 9% and productivity by 64%. When our services are based on the SAP infrastructure the costs can rise to even $15,000 per every minute of a service downtime. Even though SAP provides tools to monitor its components, the proper APM solution should deliver a holistic view over the entire infrastructure. Only then can the Operations team tell whether it is a problem with SAP components that were quite an investment to deploy or it's an infrastructure problem that's not related to the SAP or any application.
Overview of SAP tier with top most under-performing modules and most affected users
The most obvious hints on whether this is a network or an application problem can be seen by checking for the Network and Server time outliers compared to the values of the baseline traffic. But eyeballing the reports is not enough to avoid problems. The first step toward proactive application performance management is to learn to respond promptly to alerts triggered by the APM tool when key measures go outside of the usual range.
Audience: Operations, Architects
"The Cloud" comes with a great promise: endless resources for endless scalability and performance when I need it. This eliminates the need to buy a lot of hardware that sits idle most of the time but is only used during peak traffic periods. It also allows me to scale and perform far beyond what is expected without needing to wait for additional hardware to ship.
But there are some gotchas: throwing hardware at an application that is not designed to scale in a cloud environment won't leverage the possibilities that the cloud provides. In fact, it often ends up being a very costly endeavor. One must also understand that The Cloud - unless we talk about a private cloud setting - is an environment that is not owned by you. Direct access to the underlying hardware is not as easy as if the hardware is located in the next room, which makes troubleshooting or monitoring much harder. The cloud is also not just an endless resource pool of CPU, Memory or Disk On-Demand. It provides lots of other services such as storage, messaging and more which one must understand and monitor for performance, as these services are key components of your application.
It is recommended to live monitor cloud instance usage and cost in order to not fall into a cost trap
Relating to these problem areas you want to read the following blog posts: Managing Hybrid Cloud Environments, Analyzing Performance of Windows Azure Storage, Why Performance Monitoring is easier in Public than onPremise Clouds and Monitoring your Clouds.
Too Many Database Calls
Audience: Architects, Testers, Developers
Database Access is the problem we see the most within the application. It is nothing new - but - as we still see it on almost every application we work with, it is critical enough to mention it again. The first lesson learned is that the blame is often not on the database side but on the access patterns of the application to the database. All too often we see a single web request that queries thousands of database statements. There are multiple reasons for it: fetching too much data beyond just the data that is needed or inefficient fetching of data that then gets aggregated and computed in the application rather than in a stored procedure. What is really interesting is that we see this problem pattern not only in distributed applications running on modern application servers. We also see it on "legacy" applications such as VB6 or even the mainframe. The following screenshot highlights the transaction flow of an enterprise application that calls the mainframe. The mainframe transaction makes 225 SQL executions per transaction. A closer look typically reveals that the same statements are called hundreds of times due to the reasons mentioned above:
The Transaction Flow highlights how services interact with each other including the number of interactions to DB2 which indicate a potential architectural and performance problem.
Besides these access pattern problems we also see individual statements that take a long time to execute. In this case, it is important to not only focus on the database to optimize statements by tweaking indices or the like, it's also important to analyze whether these queries can be optimized from within the application. We often see that too much data is retrieved from the database, which first gets parsed by the application (using extra memory) and is then thrown away (more GC activity). Another landmine is misconfigured connection pools or application code that holds on to connections too long and ends up blocking other threads from accessing the database.
The following screenshot shows the database queries executed by a single transaction, most of them taking very long to execute. The fix to this problem was to optimize these statements in both the application and in the database:
The architects in this case started by optimizing SQL statements that took a long time to execute and those that got executed several times within the same transaction.
For further reading check out our blogs with more detailed background on these problem patterns such as Don't let your load balancers ruin your holiday business or Saving MIPS and Money. For connection pool problems we also have one interesting blog named The reason I don't monitor connection pool usage.
Big Data Not Optimized
Audience: Operations, Architects, Testers, Developers
The amount of data that we and our applications have to process is constantly growing. Big Data solutions (NoSQL, MapReduce...) provide new approaches to storing and processing large amount of data. But as with every technology it needs to be used in an optimized way to fit your specific needs. It is a misconception that you can simply process more data by adding additional resources to, e.g., a MapReduce cluster in order to speed up data processing. This only works if you have implemented your jobs in a way that allows them to scale. The same is true for accessing data from a NoSQL database. The same problems we see with relational databases also apply to accessing data in Big Data solutions. If you make inefficient queries or more queries than necessary, you are going to impact performance.
The following screenshot highlights a transaction that spends most of its time in MongoDB. A closer look into this revealed that the framework used to access MongoDB made a call to a size method of the cursor that then executed an additional query to MongoDB, which was totally unnecessary. In this example, eliminating that call reduced roundtrips to MongoDB and improved overall transaction performance by 15x:
Transactions that call JourneyCollection.getCount spend nearly half their time in MongoDB.
If you are using Big Data technologies such as Cassandra, MongoDB, Hadoop, or the like I suggest following up with the following blog posts that explain some of the problem patterns and highlight best practices: MongoDB Anti-Pattern, NoSQL vs Traditional Databases, Inside Cassandra Write Performance and What we can Learn from Cassandra Pagination. Also check out 15x Performance Improvements for Pig+HBase.
Undetected Memory Leaks
Audience: Architects, Testers, Developers
Memory and Garbage Collection problems are still very prominent issues in any enterprise application. One of the reasons is that the very nature of Garbage Collection is often misunderstood. Besides the traditional memory-related problems such as high memory usage, wrong cache usage strategies, we also see memory issues related to class loading, large classes or native memory. The following screenshot shows the problem of having single objects consuming a lot of memory. Not that this is a bad idea if necessary - but too often this happens because information is kept in memory for no apparent reason and with that consuming memory that is not available for others.
Single Object that is responsible for a big portion of the memory being leaked
Traditional memory leaks often lead to out of memory exceptions and typically to crashes of the virtual machines. This has a negative impact on the end user as the current context of user sessions and active transactions might be lost.
High memory usage on the other hand can result in high garbage collection, which has a direct impact on end user response time. Transactions that are suspended because of long running garbage collection processing can be optimized by tweaking garbage collection settings as well as being less "wasteful" with memory.
Even problems related to wrong implementations of equals/hashcode can lead to memory problems. To address this problem we wrote a full chapter on Memory Management in our Java Enterprise Performance book that explains concepts like How Garbage Collection works, Difference between JVMs, GC Tuning, High Memory Usage and the Root Cause, Class Load Related Problems and more. We have also blogged about specific memory scenarios - check out the following blogs: Memory Monitoring in WebSphere Environments, GC Bottlenecks in Heterogeneous Environments, Leak Detection in Production Environments, Top Memory Problems - Part I and Part II.
More to Come...
These landmines are some highlights with links to more detailed blog posts. As we continue to blog about these problem patterns, we plan to compile a second list of problems later this year. Keep watching our blog for more information and check out our online book on Java Enterprise Performance.
Internet of Things (IoT) will be a hybrid ecosystem of diverse devices and sensors collaborating with operational and enterprise systems to create the next big application. In their session at @ThingsExpo, Bramh Gupta, founder and CEO of robomq.io, and Fred Yatzeck, principal architect leading product development at robomq.io, discussed how choosing the right middleware and integration strategy from the get-go will enable IoT solution developers to adapt and grow with the industry, while at the same time reduce Time to Market (TTM) by using plug and play capabilities offered by a robust IoT ...
Oct. 9, 2015 02:00 AM EDT Reads: 2,192
Today’s connected world is moving from devices towards things, what this means is that by using increasingly low cost sensors embedded in devices we can create many new use cases. These span across use cases in cities, vehicles, home, offices, factories, retail environments, worksites, health, logistics, and health. These use cases rely on ubiquitous connectivity and generate massive amounts of data at scale. These technologies enable new business opportunities, ways to optimize and automate, along with new ways to engage with users.
Oct. 9, 2015 02:00 AM EDT Reads: 161
The buzz continues for cloud, data analytics and the Internet of Things (IoT) and their collective impact across all industries. But a new conversation is emerging - how do companies use industry disruption and technology enablers to lead in markets undergoing change, uncertainty and ambiguity? Organizations of all sizes need to evolve and transform, often under massive pressure, as industry lines blur and merge and traditional business models are assaulted and turned upside down. In this new data-driven world, marketplaces reign supreme while interoperability, APIs and applications deliver un...
Oct. 9, 2015 02:00 AM EDT Reads: 278
The Internet of Things (IoT) is growing rapidly by extending current technologies, products and networks. By 2020, Cisco estimates there will be 50 billion connected devices. Gartner has forecast revenues of over $300 billion, just to IoT suppliers. Now is the time to figure out how you’ll make money – not just create innovative products. With hundreds of new products and companies jumping into the IoT fray every month, there’s no shortage of innovation. Despite this, McKinsey/VisionMobile data shows "less than 10 percent of IoT developers are making enough to support a reasonably sized team....
Oct. 9, 2015 02:00 AM EDT Reads: 202
“In the past year we've seen a lot of stabilization of WebRTC. You can now use it in production with a far greater degree of certainty. A lot of the real developments in the past year have been in things like the data channel, which will enable a whole new type of application," explained Peter Dunkley, Technical Director at Acision, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Oct. 9, 2015 01:45 AM EDT Reads: 7,009
Through WebRTC, audio and video communications are being embedded more easily than ever into applications, helping carriers, enterprises and independent software vendors deliver greater functionality to their end users. With today’s business world increasingly focused on outcomes, users’ growing calls for ease of use, and businesses craving smarter, tighter integration, what’s the next step in delivering a richer, more immersive experience? That richer, more fully integrated experience comes about through a Communications Platform as a Service which allows for messaging, screen sharing, video...
Oct. 9, 2015 12:00 AM EDT Reads: 1,130
SYS-CON Events announced today that Dyn, the worldwide leader in Internet Performance, will exhibit at SYS-CON's 17th International Cloud Expo®, which will take place on November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Dyn is a cloud-based Internet Performance company. Dyn helps companies monitor, control, and optimize online infrastructure for an exceptional end-user experience. Through a world-class network and unrivaled, objective intelligence into Internet conditions, Dyn ensures traffic gets delivered faster, safer, and more reliably than ever.
Oct. 8, 2015 10:00 PM EDT Reads: 592
There are so many tools and techniques for data analytics that even for a data scientist the choices, possible systems, and even the types of data can be daunting. In his session at @ThingsExpo, Chris Harrold, Global CTO for Big Data Solutions for EMC Corporation, will show how to perform a simple, but meaningful analysis of social sentiment data using freely available tools that take only minutes to download and install. Participants will get the download information, scripts, and complete end-to-end walkthrough of the analysis from start to finish. Participants will also be given the pract...
Oct. 8, 2015 09:15 PM EDT Reads: 285
The IoT market is on track to hit $7.1 trillion in 2020. The reality is that only a handful of companies are ready for this massive demand. There are a lot of barriers, paint points, traps, and hidden roadblocks. How can we deal with these issues and challenges? The paradigm has changed. Old-style ad-hoc trial-and-error ways will certainly lead you to the dead end. What is mandatory is an overarching and adaptive approach to effectively handle the rapid changes and exponential growth.
Oct. 8, 2015 09:00 PM EDT Reads: 116
Mobile messaging has been a popular communication channel for more than 20 years. Finnish engineer Matti Makkonen invented the idea for SMS (Short Message Service) in 1984, making his vision a reality on December 3, 1992 by sending the first message ("Happy Christmas") from a PC to a cell phone. Since then, the technology has evolved immensely, from both a technology standpoint, and in our everyday uses for it. Originally used for person-to-person (P2P) communication, i.e., Sally sends a text message to Betty – mobile messaging now offers tremendous value to businesses for customer and empl...
Oct. 8, 2015 05:30 PM EDT Reads: 232
Can call centers hang up the phones for good? Intuitive Solutions did. WebRTC enabled this contact center provider to eliminate antiquated telephony and desktop phone infrastructure with a pure web-based solution, allowing them to expand beyond brick-and-mortar confines to a home-based agent model. It also ensured scalability and better service for customers, including MUY! Companies, one of the country's largest franchise restaurant companies with 232 Pizza Hut locations. This is one example of WebRTC adoption today, but the potential is limitless when powered by IoT.
Oct. 8, 2015 04:30 PM EDT Reads: 7,471
You have your devices and your data, but what about the rest of your Internet of Things story? Two popular classes of technologies that nicely handle the Big Data analytics for Internet of Things are Apache Hadoop and NoSQL. Hadoop is designed for parallelizing analytical work across many servers and is ideal for the massive data volumes you create with IoT devices. NoSQL databases such as Apache HBase are ideal for storing and retrieving IoT data as “time series data.”
Oct. 8, 2015 02:45 PM EDT Reads: 498
Clearly the way forward is to move to cloud be it bare metal, VMs or containers. One aspect of the current public clouds that is slowing this cloud migration is cloud lock-in. Every cloud vendor is trying to make it very difficult to move out once a customer has chosen their cloud. In his session at 17th Cloud Expo, Naveen Nimmu, CEO of Clouber, Inc., will advocate that making the inter-cloud migration as simple as changing airlines would help the entire industry to quickly adopt the cloud without worrying about any lock-in fears. In fact by having standard APIs for IaaS would help PaaS expl...
Oct. 8, 2015 02:30 PM EDT Reads: 653
NHK, Japan Broadcasting, will feature the upcoming @ThingsExpo Silicon Valley in a special 'Internet of Things' and smart technology documentary that will be filmed on the expo floor between November 3 to 5, 2015, in Santa Clara. NHK is the sole public TV network in Japan equivalent to the BBC in the UK and the largest in Asia with many award-winning science and technology programs. Japanese TV is producing a documentary about IoT and Smart technology and will be covering @ThingsExpo Silicon Valley. The program, to be aired during the peak viewership season of the year, will have a major impac...
Oct. 8, 2015 01:00 PM EDT Reads: 261
SYS-CON Events announced today that ProfitBricks, the provider of painless cloud infrastructure, will exhibit at SYS-CON's 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. ProfitBricks is the IaaS provider that offers a painless cloud experience for all IT users, with no learning curve. ProfitBricks boasts flexible cloud servers and networking, an integrated Data Center Designer tool for visual control over the cloud and the best price/performance value available. ProfitBricks was named one of the coolest Clo...
Oct. 8, 2015 01:00 PM EDT Reads: 763
Organizations already struggle with the simple collection of data resulting from the proliferation of IoT, lacking the right infrastructure to manage it. They can't only rely on the cloud to collect and utilize this data because many applications still require dedicated infrastructure for security, redundancy, performance, etc. In his session at 17th Cloud Expo, Emil Sayegh, CEO of Codero Hosting, will discuss how in order to resolve the inherent issues, companies need to combine dedicated and cloud solutions through hybrid hosting – a sustainable solution for the data required to manage I...
Oct. 8, 2015 01:00 PM EDT Reads: 477
Apps and devices shouldn't stop working when there's limited or no network connectivity. Learn how to bring data stored in a cloud database to the edge of the network (and back again) whenever an Internet connection is available. In his session at 17th Cloud Expo, Bradley Holt, Developer Advocate at IBM Cloud Data Services, will demonstrate techniques for replicating cloud databases with devices in order to build offline-first mobile or Internet of Things (IoT) apps that can provide a better, faster user experience, both offline and online. The focus of this talk will be on IBM Cloudant, Apa...
Oct. 8, 2015 12:45 PM EDT Reads: 508
WebRTC is about the data channel as much as about video and audio conferencing. However, basically all commercial WebRTC applications have been built with a focus on audio and video. The handling of “data” has been limited to text chat and file download – all other data sharing seems to end with screensharing. What is holding back a more intensive use of peer-to-peer data? In her session at @ThingsExpo, Dr Silvia Pfeiffer, WebRTC Applications Team Lead at National ICT Australia, will look at different existing uses of peer-to-peer data sharing and how it can become useful in a live session to...
Oct. 8, 2015 12:00 PM EDT Reads: 607
As a company adopts a DevOps approach to software development, what are key things that both the Dev and Ops side of the business must keep in mind to ensure effective continuous delivery? In his session at DevOps Summit, Mark Hydar, Head of DevOps, Ericsson TV Platforms, will share best practices and provide helpful tips for Ops teams to adopt an open line of communication with the development side of the house to ensure success between the two sides.
Oct. 8, 2015 12:00 PM EDT Reads: 576
SYS-CON Events announced today that IBM Cloud Data Services has been named “Bronze Sponsor” of SYS-CON's 17th Cloud Expo, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. IBM Cloud Data Services offers a portfolio of integrated, best-of-breed cloud data services for developers focused on mobile computing and analytics use cases.
Oct. 8, 2015 11:00 AM EDT Reads: 728