Welcome!

Java IoT Authors: Yeshim Deniz, Roger Strukhoff, Carmen Gonzalez, Elizabeth White, Liz McMillan

Related Topics: Java IoT, Industrial IoT, Open Source Cloud, IoT User Interface, Apache, SDN Journal

Java IoT: Article

Tracing Black Boxes III: Solr Query Performance Tuning

Solr Query Performance Tuning

Solr Server provides JMX statistics that show performance details such as query speed and cache hit/miss rates in a macro level, which James talked about in a previous post. However, it might be tough to trace how a particular operation; for example: how a specific query fared in the system. This week, I'd like to introduce TraceView's latest support for Solr Server, which provides breakdown of each operation, enabling more precise performance monitoring and troubleshooting.

How does Solr work, anyway?

Requests made to Solr Server can be roughly divided into 3 categories: queries, data indexing/update, admin operations (check server health/log, optimize server etc). Requests made to Solr server first go through the SolrDispatchFilter that looks up the corresponding Solr Core (a running instance of index/dataset along with configurations) and handler. The retrieved SolrCore and handler are then used to process the request. For SearchHandler, it is further broken down into smaller SearchComponents (for example, querying, highlighting results, calculating statistics etc).

Take note that since version 4, Solr added distributed capabilities in Solr called SolrCloud, which enable highly available, fault tolerant cluster of Solr servers. Index/dataset is no longer hosted on a single Core/instance. Instead, it is constructed by several shards, each as a disjoint subset of the documents in the index, and each shard can be backed by multiple cores, which are replicas of each other.

For caching, Solr uses several built-in classes (LRUCache, FastLRUCache, LFUCache) for various aspects (field, filter, query result, document etc). They are implemented as in-memory caches to allow quicker operations. The configurations of caches can be found in the Query section of solrconfig.xml.

Indexing and updates can be slow, and certainly compete for resources, but they can be done asynchronously. Admin operations can be slow as well, but they are typically one-time events, or not performance sensitive. Queries are typically the most important operation, as they live in the critical path for the user. TraceView monitor activities on each of the classes that correspond to these actions and their components to extract run-time information on each of the modules described. Let's take a look at an example.

Tracing Queries
A trace follows the path of a request through the entire application. For example, a trace below shows the breakdowns of a query operation:

solr3-request

There's a lot more going on that just searching the index for the given phrase!

Solr has a lot more functionality than just returning a list of documents that match a query. In particular, this query's time is actually dominated by "highlight.process" (highlight matching phrases in the query result) and "ResponseWriter.write" (write data to the response stream). We could speed up this request by nearly 2x by just disabling the highlighter in config!

The other interesting information here is what's not visible. This query didn't hit a cache of any sort. Even for highly variable search terms, the cache can help with users paging through longer lists or sharing links. In this case, TraceView found several cache miss events on "documentCache" that contribute to longer load time.

solr3-cache-info

Pre-warming the cache could help with this, or if this is a common term, it's possible that it's being evicted due to small cache size. We could verify this by checking the JMX statistics for cache eviction rates and adjusting the cache size in our config, if necessary.

Beyond Solr
Instrumentation on Solr Server does not only give an isolated view of how Solr performs, it also draws a complete picture of all the interactions with other applications/systems. For example, below is trace of a query request issued a typical Drupal search page backed by Solr service runs on a different host:

solr3-request-full

We can trace the request handling all the way through the Apache server, Drupal modules, and Solr handling all in one trace. In particular, this tracks requests even when the app and Solr are running on different machines, which allows you to track down issues where the application calls the Solr server incorrectly.

More Stories By Patson Luk

A Java developer who has spent the better part of the last decade working on financial services applications with companies from HSBC to Mobilearth and Parasoft, Patson is experienced in various aspects of computer systems, from large scale enterprise banking system to lightweight mobile payment solutions. He now leads Java instrumentation and tool development for the TraceView product at AppNeta. Patson's focus is on using java bytecode manipulation technologies to gain greater visibility into the full spectrum of Java based technologies. This includes higher level application frameworks from Spring and Struts to Webflow, AppServers from TomCat to JBoss, and Databases from MySQL to Oracle. He's also writes frequently on the AppNeta blog - www.appneta.com/blog

@ThingsExpo Stories
"We're a cybersecurity firm that specializes in engineering security solutions both at the software and hardware level. Security cannot be an after-the-fact afterthought, which is what it's become," stated Richard Blech, Chief Executive Officer at Secure Channels, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
In addition to all the benefits, IoT is also bringing new kind of customer experience challenges - cars that unlock themselves, thermostats turning houses into saunas and baby video monitors broadcasting over the internet. This list can only increase because while IoT services should be intuitive and simple to use, the delivery ecosystem is a myriad of potential problems as IoT explodes complexity. So finding a performance issue is like finding the proverbial needle in the haystack.
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life sett...
The WebRTC Summit New York, to be held June 6-8, 2017, at the Javits Center in New York City, NY, announces that its Call for Papers is now open. Topics include all aspects of improving IT delivery by eliminating waste through automated business models leveraging cloud technologies. WebRTC Summit is co-located with 20th International Cloud Expo and @ThingsExpo. WebRTC is the future of browser-to-browser communications, and continues to make inroads into the traditional, difficult, plug-in web ...
20th Cloud Expo, taking place June 6-8, 2017, at the Javits Center in New York City, NY, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy.
Internet-of-Things discussions can end up either going down the consumer gadget rabbit hole or focused on the sort of data logging that industrial manufacturers have been doing forever. However, in fact, companies today are already using IoT data both to optimize their operational technology and to improve the experience of customer interactions in novel ways. In his session at @ThingsExpo, Gordon Haff, Red Hat Technology Evangelist, will share examples from a wide range of industries – includin...
WebRTC is the future of browser-to-browser communications, and continues to make inroads into the traditional, difficult, plug-in web communications world. The 6th WebRTC Summit continues our tradition of delivering the latest and greatest presentations within the world of WebRTC. Topics include voice calling, video chat, P2P file sharing, and use cases that have already leveraged the power and convenience of WebRTC.
"We build IoT infrastructure products - when you have to integrate different devices, different systems and cloud you have to build an application to do that but we eliminate the need to build an application. Our products can integrate any device, any system, any cloud regardless of protocol," explained Peter Jung, Chief Product Officer at Pulzze Systems, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
Data is the fuel that drives the machine learning algorithmic engines and ultimately provides the business value. In his session at 20th Cloud Expo, Ed Featherston, director/senior enterprise architect at Collaborative Consulting, will discuss the key considerations around quality, volume, timeliness, and pedigree that must be dealt with in order to properly fuel that engine.
"Once customers get a year into their IoT deployments, they start to realize that they may have been shortsighted in the ways they built out their deployment and the key thing I see a lot of people looking at is - how can I take equipment data, pull it back in an IoT solution and show it in a dashboard," stated Dave McCarthy, Director of Products at Bsquare Corporation, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
IoT is rapidly changing the way enterprises are using data to improve business decision-making. In order to derive business value, organizations must unlock insights from the data gathered and then act on these. In their session at @ThingsExpo, Eric Hoffman, Vice President at EastBanc Technologies, and Peter Shashkin, Head of Development Department at EastBanc Technologies, discussed how one organization leveraged IoT, cloud technology and data analysis to improve customer experiences and effici...
Fact is, enterprises have significant legacy voice infrastructure that’s costly to replace with pure IP solutions. How can we bring this analog infrastructure into our shiny new cloud applications? There are proven methods to bind both legacy voice applications and traditional PSTN audio into cloud-based applications and services at a carrier scale. Some of the most successful implementations leverage WebRTC, WebSockets, SIP and other open source technologies. In his session at @ThingsExpo, Da...
"IoT is going to be a huge industry with a lot of value for end users, for industries, for consumers, for manufacturers. How can we use cloud to effectively manage IoT applications," stated Ian Khan, Innovation & Marketing Manager at Solgeniakhela, in this SYS-CON.tv interview at @ThingsExpo, held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA.
As data explodes in quantity, importance and from new sources, the need for managing and protecting data residing across physical, virtual, and cloud environments grow with it. Managing data includes protecting it, indexing and classifying it for true, long-term management, compliance and E-Discovery. Commvault can ensure this with a single pane of glass solution – whether in a private cloud, a Service Provider delivered public cloud or a hybrid cloud environment – across the heterogeneous enter...
The cloud promises new levels of agility and cost-savings for Big Data, data warehousing and analytics. But it’s challenging to understand all the options – from IaaS and PaaS to newer services like HaaS (Hadoop as a Service) and BDaaS (Big Data as a Service). In her session at @BigDataExpo at @ThingsExpo, Hannah Smalltree, a director at Cazena, provided an educational overview of emerging “as-a-service” options for Big Data in the cloud. This is critical background for IT and data professionals...
Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more business becomes digital the more stakeholders are interested in this data including how it relates to business. Some of these people have never used a monitoring tool before. They have a question on their mind like “How is my application doing” but no id...
@GonzalezCarmen has been ranked the Number One Influencer and @ThingsExpo has been named the Number One Brand in the “M2M 2016: Top 100 Influencers and Brands” by Onalytica. Onalytica analyzed tweets over the last 6 months mentioning the keywords M2M OR “Machine to Machine.” They then identified the top 100 most influential brands and individuals leading the discussion on Twitter.
What happens when the different parts of a vehicle become smarter than the vehicle itself? As we move toward the era of smart everything, hundreds of entities in a vehicle that communicate with each other, the vehicle and external systems create a need for identity orchestration so that all entities work as a conglomerate. Much like an orchestra without a conductor, without the ability to secure, control, and connect the link between a vehicle’s head unit, devices, and systems and to manage the ...
More and more brands have jumped on the IoT bandwagon. We have an excess of wearables – activity trackers, smartwatches, smart glasses and sneakers, and more that track seemingly endless datapoints. However, most consumers have no idea what “IoT” means. Creating more wearables that track data shouldn't be the aim of brands; delivering meaningful, tangible relevance to their users should be. We're in a period in which the IoT pendulum is still swinging. Initially, it swung toward "smart for smar...
In an era of historic innovation fueled by unprecedented access to data and technology, the low cost and risk of entering new markets has leveled the playing field for business. Today, any ambitious innovator can easily introduce a new application or product that can reinvent business models and transform the client experience. In their Day 2 Keynote at 19th Cloud Expo, Mercer Rowe, IBM Vice President of Strategic Alliances, and Raejeanne Skillern, Intel Vice President of Data Center Group and G...