Welcome!

Java IoT Authors: Yeshim Deniz, Pat Romanski, Liz McMillan, Zakia Bouachraoui, Carmen Gonzalez

Related Topics: Java IoT, Industrial IoT, Microservices Expo

Java IoT: Article

i-Technology Viewpoint: Laziness Sometimes Pays

The Gains Made by Better Algorithms Almost Always Outstrip the Gains From Better Hardware

Let me begin by a philosophical rant. There is a motto from scientific computing that carries to many areas of computer science:

/The gains made by better algorithms almost always outstrip the gains from better hardware./

I've frequently seen where algorithm improvements pay by factors of tens to tens of thousands in CPU time. One change I made in a numerical algorithm improved CPU requirements by a factor of 50,000: from weeks on a super-computer to minutes on a workstation.

Any business-savvy engineer knows that algorithm improvements come at a price: the engineer's time. Striking that balance makes software systems move forward rather than staggering to a halt in bloat and dysfunction. It also helps to use people who actually know what they are doing: knowing how to compile code doesn't make you a software engineer any more than knowing how to spell makes you a writer. End of rant.

On to (rant related) business. On most Web sites, think of how many times a data source will be used to retrieve the same data and produce the same content over and over again. Most successful services deliver a highly redundant amount of information to their users. For example, the JDJ website will deliver this (same) content to perhaps a hundred thousand users. If the servers are overtaxed, customers will experience significant delays or malfunctions.

There are several useful solutions to this. Well configured caching proxy servers come to mind, although server-side scripting make this difficult. Buying more hardware will eventually fix the problem, which may be the correct business solution.

But what about asking programmers to be a little more lazy?

For this article I've included the source for the LazyFileOutputStream. It acts just like a regular FileOutputStream except that, if created on a file that already exists, it /reads/ the data from the file instead of writes it. The stream compares what is already in the file with what you are currently writing to it. If at any point it sees there is a difference in the data you are writing this time compared to what is already there, the stream automatically switches to a write-mode that writes over the remainder of the file with the changes.

The upshot is, if your program generates the same output twice, the output file is unmodified the second time (leaving the original modification date). First, by simply changing FileOutputStream to LazyFileOutputStream, any downstream processing can use timestamp information on the files to check if they need to do anything at all. If the timestamp hasn't changed, then neither has the contents.

But wait, there's more! In addition to the standard close(), the LazyFileOutputStream also supports abort(). This method effectively states "I'm done now, leave the rest of the file alone." The remainder of the file will be the same, even without reproducing it. This means that, if you determine at an early point in the processing of the file that it's going to turn out the same, you can simply abort() to leave it alone. Its similar to the idea of not changing the modification dates on files which are rewritten with the same data, but allows for saving CPU time for the current process step as well as downstream processing..

Certain engines produce part of the template before you can conveniently intervene to decide if you really need to regenerate it. By opening up the output as a Lazy file, you can just abort() early and have the old version, with the the old modification time, around for downstream processing.

Okay, rant concluded and point made: CPUs around the planet are spinning through the same data tens of thousands of times producing the same content tens of thousands of times. Instead of buying great big servers to manage this, a smart caching policy based on lazy file writers and some modification time testing could save some sites that same wild-sounding factor of 50,000. Without having to buy 50,000 new servers.

Anecdote # 1. There is a certain technical advantage to this style of writing data as well: most storage devices are easier to read from than write to, adhering to the 80/20 rule: 80% of file access will be reads, 20% will be writes.. The LazyFileOutputStream takes advantage of that for the many files which are simply rewritten with the same content.

Anecdote # 2. There must be a few curled toes out there saying to themselves, "Why not LazyFileWriter?" There are good technical reasons for the OutputStream: the logic of the data written must be checked in its raw /byte/ format for the idea to work correctly, and you can always wrap this in an OutputStreamWriter, followed by a BufferedWriter, which is what I recommend.

Now I'm even done with the anecdotes. Have a nice day.

More Stories By Warren MacEvoy

Warren D. MacEvoy is Asst Professor of Comp Science in the department of Computer Science, Mathematics & Statistics at Mesa State College, Grand Junction, Colorado.

Comments (8)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


IoT & Smart Cities Stories
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
Digital Transformation is much more than a buzzword. The radical shift to digital mechanisms for almost every process is evident across all industries and verticals. This is often especially true in financial services, where the legacy environment is many times unable to keep up with the rapidly shifting demands of the consumer. The constant pressure to provide complete, omnichannel delivery of customer-facing solutions to meet both regulatory and customer demands is putting enormous pressure on...
IoT is rapidly becoming mainstream as more and more investments are made into the platforms and technology. As this movement continues to expand and gain momentum it creates a massive wall of noise that can be difficult to sift through. Unfortunately, this inevitably makes IoT less approachable for people to get started with and can hamper efforts to integrate this key technology into your own portfolio. There are so many connected products already in place today with many hundreds more on the h...
The standardization of container runtimes and images has sparked the creation of an almost overwhelming number of new open source projects that build on and otherwise work with these specifications. Of course, there's Kubernetes, which orchestrates and manages collections of containers. It was one of the first and best-known examples of projects that make containers truly useful for production use. However, more recently, the container ecosystem has truly exploded. A service mesh like Istio addr...
Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As automation and artificial intelligence (AI) power solution development and delivery, many businesses need to build backend cloud capabilities. Well-poised organizations, marketing smart devices with AI and BlockChain capabilities prepare to refine compliance and regulatory capabilities in 2018. Volumes of health, financial, technical and privacy data, along with tightening compliance requirements by...
Charles Araujo is an industry analyst, internationally recognized authority on the Digital Enterprise and author of The Quantum Age of IT: Why Everything You Know About IT is About to Change. As Principal Analyst with Intellyx, he writes, speaks and advises organizations on how to navigate through this time of disruption. He is also the founder of The Institute for Digital Transformation and a sought after keynote speaker. He has been a regular contributor to both InformationWeek and CIO Insight...
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
To Really Work for Enterprises, MultiCloud Adoption Requires Far Better and Inclusive Cloud Monitoring and Cost Management … But How? Overwhelmingly, even as enterprises have adopted cloud computing and are expanding to multi-cloud computing, IT leaders remain concerned about how to monitor, manage and control costs across hybrid and multi-cloud deployments. It’s clear that traditional IT monitoring and management approaches, designed after all for on-premises data centers, are falling short in ...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...