Welcome!

Java IoT Authors: Zakia Bouachraoui, William Schmarzo, Liz McMillan, Pat Romanski, Elizabeth White

Related Topics: @CloudExpo, Java IoT, @DXWorldExpo, @ThingsExpo, @DevOpsSummit

@CloudExpo: Article

A High Performing API | @CloudExpo #API #IoT #M2M #DigitalTransformation

Performance is the elusive butterfly of API development

How to Create a High Performing API: A New Perspective for 2016
by Bob Reselman

Performance is the elusive butterfly of API development. Everybody is intrigued with its beauty, yet few know how to capture it.

In the old days, the approach of many shops to ensure a performant API was to create some code and then pass it over to the wall to QA to do load testing. Later some integration testing took place. As long as the API worked and it was met some marginal performance benchmarks, things were good.

This worked well when a public, HTTP based API, consumed by a wide variety of distributed devices was more the exception than the rule. However, today APIs are a big deal and they are everywhere, so much so that companies are posting very big infographics prominently in the front page of the New York Times to create even more awareness about the technology to the general public.

This is good news.

The rapid growth and increasing popularity of API use is causing a lot of companies to look inward, to take new views on API performance. Code, load test, and publish won't do any longer. Companies are doing more. They are looking beyond the HTTP entry points.

Today the whole technical stack upon which an API sits is grist for the performance mill.

Look to the data
One of the most interesting discoveries I've made when talking to people that publish large scale APIs is how critical underlying data structures and data architecture is to the overall picture. Diamond DevOps is a company that does a lot of work on both sides of the API fence, consuming APIs and publishing them. I talked to one of the key technical people, Diego Woitasen (@DiegoWoitasen) co-found and tech-lead, about what he looks for when considering API performance. He came back with a two words, database indexes.

Diego's take is that many times less experienced database developers will throw indexes on a database intended to speed reads without giving consideration to the impact on writes. To quote Diego:

We took an app from a client that we were to refactor, but in the meantime we needed to keep the old app running. We discovered that there were 10 to 15 tables and more than 100 indexes. Indexes affect write performance and in this case the app was used to collect data mostly. Using so many indexes was a really bad choice. You can add indexes for apps that have more read operations than write operations.

Separating read functionality from write functionality at the database level can be a critical design decision when it comes to API performance.

Using denormalization in order to separate read from write functionality proved to be a big win in terms of API performance for Dmytro Seredenko (@dseredenko) Senior Director of North American Business at EPAM Systems. According to Dmytro:

We had a requirement to expose aggregated data on visitors through the API, sliced in multiple dimensions. The underlying system was a reporting component (RDBMS) that was fed by the data from a Map-Reduce job. ... it worked pretty slowly....

So we had to denormalize aggregated data stored in the Reporting RDBMS so the data could be queried quickly without complex joins. It (denormalizing) did increase the performance significantly. Since our API was read-only, we horizontally scaled RDMS through adding read-only nodes.

You can have lightning fast web servers in play up at the endpoints, but if you're not getting the data you need, when you need it, your performance will suffer. Data architecture really does matter. However, data design is not the only consideration. Workflow process comes into play often.

It's the use case
A common scenario in API usage is what I call, "a lot of state definition in, a lot of data back."

In this type of situation, you have an API that requires you to submit a lot of information about the use case at hand. The API will do a boatload of processing on that information and return a lot of data back. I've experienced cases in the casting industry in which an agent will have to submit hundreds of actors for a given role and the API will have to process all of that information. Once processed, a lot of information about that submission is returned. The submission data is large, the processing is laborious, and the data return can be big too.

How to address this issue? To quote Dmytro Seredenko again, "It's important to keep the dialog."

Dmytro and others propose that in certain cases, it's useful to segment processing via a number of API endpoints and to provide callback information when certain background processes complete.

Those of us that have posted video for processing on the Internet are familiar with the pattern. You submit your video and then, once the upload is complete, the site will send you an email indicating your video is ready for viewing. Granted email notification is a pretty primitive way to transmit state information via callback. But, it is consistent with the conversation pattern.

Typically as a site improves processing speed, email callback gets eliminated. But, getting an email is a far sight better than having a user sitting in front of screen watching a spinning dial for tens of minutes on end.

Understanding the services your API is to deliver and figuring out how to design an architecture that segments processing into a series of dialog-like API calls will improve the overall performance of the API experience.

Still, what do you do in situations where you keep finding yourself submitting a lot of information to an API in order to get work done? This is where the notion of state caching can come into play.

Be essential
Online shopping sites are essentially one big state machine. You have a lot of data in play - customers, inventory, shipments, payments, etc  - all in various states of flux. Also there are algorithms reacting to any and all state change. Online shopping can be an API performance nightmare, API all upon API call needed to select items to buy, make payment and then shipment.

The online retailer Nordstromrack.com | HauteLook is confronted with this state problem all the time. The way the company has dealt with the problem is to create a core design sensibility which all developers are to follow. Raj Murali (@rex_thuh_king ) Senior Manager of ERP Engineering at Nordstromrack.com | HauteLook, states this principal simply:

"The fastest API is one that has to do NOTHING."

Raj and his team have devised a way in which a significant load of API work is done by background processes that store information in a distributed cache. In many cases, the work the API does is nothing more than checking the cache to determine the state of the given process. Also, their code takes full advantage of the HTTP response code standard. When a process is started via an API call, a 202 Accepted response code is returned. Later on when an API call needs to know if a process is complete, a 200 OK response code is delivered.

Creating an API endpoint that has essentially one piece of fast, finite work goes a long way to improving API performance. Yes, there is a lot of management to be done on the backend. However, making your API endpoint essential allows you the flexibility to seek performance gains down in the stack. The more work your API has to do, and the more state it has to hold on the web server, will make it more brittle. A brittle API may be fast today and slow a week from now.

Putting it all together
As I mentioned at the beginning of this article, there is a whole lot more to creating high performance APIs than coding and load testing. Comprehensive design and analysis all the way through the stack, from database, to workflow process design, all the way up to HTTP access point, is critical. It's a different way of thinking, a different perspective. There are the three fundamental takeaways to remember as we move forward.

First, give a lot of attention to how your API is writing and reading data. Be relentless in squeezing every bit of unnecessary work out of your data infrastructure. As we read above, be very careful about how you use indexes. Separate read databases from write database and synchronize data accordingly. Denormalize whenever possible. Make each of these things more efficient can add up to enough improvements in performance.

The second is to understand the use of your API as an aggregate of endpoints. Can you define relationships among your API endpoints that have a common semantic meaning? If so, can you make it so that your API endpoints can participate effectively and efficiently in a structured, self-enforcing conversation? Sometimes a lot of back and forth transmission between a publisher and a consumer can be more effective than one big, data heavy interaction with a lot of processing burden.

The third is have your API get as close to doing nothing as is possible. If your application accesses a lot of global state information that is slow moving, can you make it so your API avoids the costly CPU utilization that comes with in-process calculation? Can you use background processes? Can you use a distributed cache to hold slow moving data that is global to all endpoints? Can you just make a simple call to another endpoint to get the information? Again, you want your API calls to be fast, without having to bear the burden of a lot of real time processing.

In closing
Consumers want information and services that are accurate and they want them fast. Thus, just to be in the game your API needs to a level of performance that is very high.

Moving beyond the old school paradigm of code, load test, publish will open new doors in which performance is seen as an important feature of your API and not some after the fact consideration. Take a new perspective on API performance. Move beyond the endpoint perspective to one in which your entire system is really the API.

You'll be happy you did. Your customers will be even happier.

More Stories By SmartBear Blog

As the leader in software quality tools for the connected world, SmartBear supports more than two million software professionals and over 25,000 organizations in 90 countries that use its products to build and deliver the world’s greatest applications. With today’s applications deploying on mobile, Web, desktop, Internet of Things (IoT) or even embedded computing platforms, the connected nature of these applications through public and private APIs presents a unique set of challenges for developers, testers and operations teams. SmartBear's software quality tools assist with code review, functional and load testing, API readiness as well as performance monitoring of these modern applications.

IoT & Smart Cities Stories
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
Nicolas Fierro is CEO of MIMIR Blockchain Solutions. He is a programmer, technologist, and operations dev who has worked with Ethereum and blockchain since 2014. His knowledge in blockchain dates to when he performed dev ops services to the Ethereum Foundation as one the privileged few developers to work with the original core team in Switzerland.
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
Whenever a new technology hits the high points of hype, everyone starts talking about it like it will solve all their business problems. Blockchain is one of those technologies. According to Gartner's latest report on the hype cycle of emerging technologies, blockchain has just passed the peak of their hype cycle curve. If you read the news articles about it, one would think it has taken over the technology world. No disruptive technology is without its challenges and potential impediments t...
If a machine can invent, does this mean the end of the patent system as we know it? The patent system, both in the US and Europe, allows companies to protect their inventions and helps foster innovation. However, Artificial Intelligence (AI) could be set to disrupt the patent system as we know it. This talk will examine how AI may change the patent landscape in the years to come. Furthermore, ways in which companies can best protect their AI related inventions will be examined from both a US and...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
Bill Schmarzo, Tech Chair of "Big Data | Analytics" of upcoming CloudEXPO | DXWorldEXPO New York (November 12-13, 2018, New York City) today announced the outline and schedule of the track. "The track has been designed in experience/degree order," said Schmarzo. "So, that folks who attend the entire track can leave the conference with some of the skills necessary to get their work done when they get back to their offices. It actually ties back to some work that I'm doing at the University of San...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...